rfc:better_benchmarks

Request for Comments: Better benchmarks for PHP

Contributors: Add your name to the bottom of the page

Replace PHP's current bench.php with a benchmark suite from which meaningful performance measurements can be taken.

Introduction

PHP's current bench.php is a micro-benchmark which tests a number of language features. Since it does not behave like a 'real' application, it cannot provide meaningful information about the performance of PHP in general. This RFC will attempt to replace bench.php with a better suite of benchmarks, upon which meaningful performance measurements can be taken.

What are we trying to measure

Performance impact of changes to PHP or its libraries

We need to be able to see how changes to the Zend engine affect real programs, not synthetic benchmarks.

Realistic profiling

We need realistic applications and workloads in order to get useful profiling info. Useful to know which pieces of the VM need optimization.

Performance improvement of other PHP implementations relative to PHP

There is a need for other PHP implementations to be able to measure their performance against the Zend engine.

What do we want in a benchmark

Benchmark applications should be straightforward to deploy, preferably with no manual setup, even when multiple machines are required.

  • The benchmarks must be varied. We are not trying to measure features of PHP. We are trying to measure performance of applications.
  • preferably few or no external dependencies
  • It must be possible to separate results for the language, the database, the webserver and the benchmarking client.
  • DBs could possibly be changed to use a file-based engine like SQLite.
  • Web-apps benchmarks must have (real) data in order to simulate real work.

Basically we would like to see real-world apps tweaked so that automated deployment and testing is easy and simple. Each benchmark should also have a set of representative workload inputs (e.g. from real-world log files).

Benchmarks should be licensed under a liberal open-source license, so that they can be “safely” customized and redistributed for our needs.

The plan

THIS IS NOT SET IN STONE - FEEDBACK ESPECIALLY NEEDED

  • Start with something simple
    • Develop simple CLI benchmarks with the criteria “better than bench.php”
      • The best idea is probably to port existing javascript benchmarks (see bottom of page)
    • Tools to analyse and compare test runs.
  • Build a larger suite based on the infrastructure created from the first part
    • A suite of about 10 web-apps, which can be deployed and run in a straightforward manner
      • based on real world apps, with real world data
  • Integrate with build infrastructure
    • Automated summaries
    • Graphs over time
    • etc

What benchmarks currently exist

Microbenchmarks

    • The bench.php script just performs a few standard synthetic tests (e.g. ackermann, fibonacci, etc..). It also performs some tests on the areas that were optimized in Zend Engine 2 (so it's a little biased). However, it doesn't perform any 'real' work.
    • There are many language features that it does not use, that might be used (perhaps sporadically) in a larger application:
      • OO
      • Variable-variables
      • Variable-function or -method calls (in fact, the call-graph is entirely static, and very shallow -- although there are at least cycles).
      • References
      • call-time pass-by-ref
      • dynamic function/class definition
      • eval
      • include (etc)
      • extract/compact
    • It seems that this test really only tests language features, and doesnt actually do any work.
    • only performs a few synthectic micro tests to compare different ways of doing the same. e.g. benchmark the usage of '$a += $b' vs '$a = $a + $b'
    • Provides a large number of small benchmarks
    • Tests OO, and references
    • Most of the benchmarks are very short (program length, not run-time) - the longest is 481 lines, and only 8 are longer than 100 lines.
    • Some tests come with input data
    • Short, doesnt take command line parameters, is varied, but simple.
    • Many benchmarks are copied from the language shootout (old version thereof, apparently)
    • Some tests come with input data

Large benchmarks

    • very complete but very expensive
    • contains 3 applications: banking website, e-commerce website and a vendor support website
    • workload based on real data. more details at http://www.spec.org/web2005/docs/1.20/design/SPECweb2005_Design.html
    • Requires a number of machines: for DB, front-end and clients. Difficult to deploy. Results difficult to replicate for other developers.
    • This benchmarks the entire stack, not just PHP. If PHP is not the bottleneck, then the 'improvements' being tested will not appear. This can be an advantage - small improvements may not increase the speed of an web application in real life. However, it obscures what we are trying to measure - the speed of the PHP implementation.
    • Written in PHP 4, with no OO. Only minor changes required to make it run in PHP5.
    • Difficult to deploy. Requires at least 3 machines (it can be deployed on a single machine, but the results would be worthless).
    • Like SPECweb2005, the benchmarks the entire stack, not just PHP.

Applications which could make good web-app benchmarks

Not much so far

    • One of the most popular BBS applications written in PHP.
    • Phalanger uses it for demonstrating its performance advantage.

Applications which could make good CLI benchmarks

    • A Javascript interpreter written in PHP
    • A tool for creating UML diagrams from PHP source
    • Not very useful if it is IO bound
    • Lots of data sets: can use any PHP package
      • NOTE: Seems to require graphviz
    • An odd language implementation
    • Uses 00
    • A small number of data sets (programs written in whirl)

Porting other benchmark suites

Benchmarking in other languages

Desired Benchmark Features

For command-line apps applications

  • Run-time
  • Memory usage
  • Hardware performance counters (if available), using PAPI
  • Simulated hardware statistics, using cachegrind
    • These should be combined into a single representative number, using some hardware model.
  • Ability to compare all of these over two runs
  • Support for other PHP implementations
  • Benchmark characterisation:
    • This is hard to do properly, so best to do it badly and give coarse grained information.

*TODO*

For web-apps

  • Requests per second
  • Memory usage
  • Bottleneck (is it scripting, DB, or network)
  • Total time for request

Status

Completed Tests

  • raytracer (28.04.2009)
  • deltablue (04.05.2009)
  • crypto (07.06.2009)
  • whirl & j4p5
  • crypto-md5 (29.06.2009)
  • richards (29.06.2009)
  • Crypto-AES

Still unfinished

Collaboration

CVS

The benchmarks are hosted in CVS: http://cvs.php.net/viewvc.cgi/php-benchmarks/. Checkout with 'cvs -d :pserver:cvsread@cvs.php.net:/repository co php-benchmarks'. More info at http://php.net/anoncvs.php

If you don't have karma, you won't be able to edit the wiki, or commit benchmarks to the suite. Since this takes time, please do not wait for karma to contact us. If you have contributions to make, we'd like hear them, and can make changes to the wiki on your behalf.

To get karma, please fill in the form on http://php.net/cvs-php.php. In the text box, fill in 'Contributing benchmarks'. For 'Type of initial karma' enter 'PHP Group'. Please email Nuno (email below) when you have submitted the form. Once you get karma, please add your username (and other details, if missing), to the contributors section below.

Discussion

Discussion on the QA mailing list (php-qa@lists.php.net). Please include [benchmarks] in the subject.

Contributors

Add your name here if you want to help.

  • Paul Biggar - paul.biggar [at] gmail.com
  • Nuno Lopes - nlopess [at] php.net
  • Ólafur Waage - olafurw [at] gmail.com
  • Michiaki Tatsubori - mich [at] acm.org
  • Alexander Hjalmarsson - hjalle [at] sgh.se
  • Davide Mendolia - idaf1er [at] gmail.com
rfc/better_benchmarks.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1