The PHP.net website is a critical resource for PHP developers worldwide, providing documentation, news, and updates about the PHP language. It has millions of visits every year. At least, we're pretty sure it does, since currently PHP.net lacks any useful analytics beyond rudimentary server logs. That makes it difficult to determine where or how best to invest resources (whether volunteer or paid) in improving the PHP.net experience, particularly the documentation.
Of particular interest, the PHP Foundation is looking into expanding its scope to fund improvements to the documentation. However, there are over 17,000 documentation pages on php.net, and right now no one knows which ones are the most high-traffic and worth investing resources in. To know where to get the best “bang for the buck,” both literally and figuratively, we need better data.
This RFC proposes implementing a self-hosted analytics solution to gather valuable insights into how users interact with the site.
The Infrastructure Team, in cooperation with the Foundation, will install the Matomo analytics server on PHP-maintained hardware, and install a tracking code on PHP.net. The process is very similar to Google Analytics, but self-hosted.
Matomo will be configured to avoid saving any Personally Identifying Information (PII).
The specific code to be placed on php.net can be previewed in this commit.
The raw data collected by Matomo will be available only to the PHP Infrastructure Team, which may include staff from the PHP Foundation.
Fully anonymized aggregate data (such as most-popular-pages, overall traffic rates, coarse-grained geographic information, etc.) will be made publicly available as feasible.
To acknowledge the presence of an analytics service, the “Logfiles” section of the PHP Privacy Policy page will be replaced with the following:
Analytics
PHP.net collects anonymous user statistics to help improve the site. We do not collect any personally identifiable information, and you may opt-out of analytics at any time. Collected analytics are used exclusively by the PHP.net team and PHP Foundation to improve PHP.net. The raw data is never shared with any third party, ever, unless compelled by a valid court order.
One of the chief concerns with any analytics system is tracking by third parties. While Google Analytics and similar services are most popular, many members of the PHP community are justifiably concerned about what such companies do with the data they collect. For that reason, we believe this is a case where self-hosting is the better option, even if it isn't as feature-rich as some third party services. Protecting the privacy of our users is of paramount importance.
There are many self-hosted analytics packages available on the market. Matomo was selected for a number of reasons.
While there are no doubt other viable options on the market, the above points (particularly the team's familiarity with the tool already) make it the most straightforward option.
Matomo has the ability to ingest server log files as an alternative to using a JS code. However, that would result in inadequate data for a number of reasons.
A client-side tracker also provides far richer data, such as:
None of that information can be derived from server logs. All of it could be derived from a client-side tracker, without collecting any PII.
No. Third-party trackers that uniquely identify individuals across multiple domains and make that data available to other third parties are evil. First-party analytics can provide valuable insights into how users use a website. The safety advantages of this approach are:
This is a simple yes-or-no vote to approve this service. 2/3 majority required to pass.