Request for Comments: Better benchmarks for PHP

Version: 1.0
Date: 2009-02-01
Author: Paul Biggar paul.biggar@gmail.com, Nuno Lopes nlopess@php.net
Status: Started

Contributors: Add your name to the bottom of the page

Replace PHP's current bench.php with a benchmark suite from which meaningful performance measurements can be taken.

Introduction

PHP's current bench.php is a micro-benchmark which tests a number of language features. Since it does not behave like a 'real' application, it cannot provide meaningful information about the performance of PHP in general. This RFC will attempt to replace bench.php with a better suite of benchmarks, upon which meaningful performance measurements can be taken.

What are we trying to measure

Performance impact of changes to PHP or its libraries

We need to be able to see how changes to the Zend engine affect real programs, not synthetic benchmarks.

Realistic profiling

We need realistic applications and workloads in order to get useful profiling info. Useful to know which pieces of the VM need optimization.

Performance improvement of other PHP implementations relative to PHP

There is a need for other PHP implementations to be able to measure their performance against the Zend engine.

What do we want in a benchmark

Benchmark applications should be straightforward to deploy, preferably with no manual setup, even when multiple machines are required.

The benchmarks must be varied. We are not trying to measure features of PHP. We are trying to measure performance of applications.
preferably few or no external dependencies
It must be possible to separate results for the language, the database, the webserver and the benchmarking client.
DBs could possibly be changed to use a file-based engine like SQLite.
Web-apps benchmarks must have (real) data in order to simulate real work.

Basically we would like to see real-world apps tweaked so that automated deployment and testing is easy and simple. Each benchmark should also have a set of representative workload inputs (e.g. from real-world log files).

Benchmarks should be licensed under a liberal open-source license, so that they can be “safely” customized and redistributed for our needs.

The plan

THIS IS NOT SET IN STONE - FEEDBACK ESPECIALLY NEEDED

Start with something simple
- Develop simple CLI benchmarks with the criteria “better than bench.php”
  - The best idea is probably to port existing javascript benchmarks (see bottom of page)
- Tools to analyse and compare test runs.

Build a larger suite based on the infrastructure created from the first part
- A suite of about 10 web-apps, which can be deployed and run in a straightforward manner
  - based on real world apps, with real world data

Integrate with build infrastructure
- Automated summaries
- Graphs over time
- etc

What benchmarks currently exist

Microbenchmarks

Zend Engine's bench.php
- The bench.php script just performs a few standard synthetic tests (e.g. ackermann, fibonacci, etc..). It also performs some tests on the areas that were optimized in Zend Engine 2 (so it's a little biased). However, it doesn't perform any 'real' work.
- There are many language features that it does not use, that might be used (perhaps sporadically) in a larger application:
  - OO
  - Variable-variables
  - Variable-function or -method calls (in fact, the call-graph is entirely static, and very shallow -- although there are at least cycles).
  - References
  - call-time pass-by-ref
  - dynamic function/class definition
  - eval
  - include (etc)
  - extract/compact

PHPBench
- It seems that this test really only tests language features, and doesnt actually do any work.

PHP BS
- only performs a few synthectic micro tests to compare different ways of doing the same. e.g. benchmark the usage of '$a += $b' vs '$a = $a + $b'

Debian Language Shootout
- Provides a large number of small benchmarks
- Tests OO, and references
- Most of the benchmarks are very short (program length, not run-time) - the longest is 481 lines, and only 8 are longer than 100 lines.
- Some tests come with input data

Roadsend
- Short, doesnt take command line parameters, is varied, but simple.
- Many benchmarks are copied from the language shootout (old version thereof, apparently)
- Some tests come with input data

Large benchmarks

SPECweb2005
- very complete but very expensive
- contains 3 applications: banking website, e-commerce website and a vendor support website
- workload based on real data. more details at http://www.spec.org/web2005/docs/1.20/design/SPECweb2005_Design.html
- Requires a number of machines: for DB, front-end and clients. Difficult to deploy. Results difficult to replicate for other developers.
- This benchmarks the entire stack, not just PHP. If PHP is not the bottleneck, then the 'improvements' being tested will not appear. This can be an advantage - small improvements may not increase the speed of an web application in real life. However, it obscures what we are trying to measure - the speed of the PHP implementation.
- More information in paper: SPECweb2005®™ in the Real World: Using Internet Information Server (IIS) and PHP by Warner and Worley.

RUBBoS: Bulletin Board Benchmark
- Written in PHP 4, with no OO. Only minor changes required to make it run in PHP5.
- Difficult to deploy. Requires at least 3 machines (it can be deployed on a single machine, but the results would be worthless).
- Like SPECweb2005, the benchmarks the entire stack, not just PHP.

Dell DVD Store Database Test Suite
- Written in PHP 5. Has DB backends for MySQL, Oracle and SQLServer
- TODO: take a closer look to see if it can be used

RUBiS: Rice University Bidding System
- An auction site application modeled after eBay.
- Implementation variations in Java servlets and EJB are also available and compared in Middleware 2003 paper Cecchet et al. 2003].

Applications which could make good web-app benchmarks

Not much so far

SugarCRM
- A fairly large, production-quality, still open-sourced CRM application.
- An extensive benchmark experimentation is published as a blog entry at Sun Microsystems.

MediaWiki
- A Wiki engine famous for its use in Wikipedia.
- See Benchmarking of it.

phpBB
- One of the most popular BBS applications written in PHP.
- Phalanger uses it for demonstrating its performance advantage.

Applications which could make good CLI benchmarks

j4p5
- A Javascript interpreter written in PHP

phuml
- A tool for creating UML diagrams from PHP source
- Not very useful if it is IO bound
- Lots of data sets: can use any PHP package
  - NOTE: Seems to require graphviz

phpWhirl
- An odd language implementation
- Uses 00
- A small number of data sets (programs written in whirl)
  - A useful input

Porting other benchmark suites

V8: http://code.google.com/p/v8/source/browse/trunk/benchmarks/
- These will probably take a day each to port
- The data sets are built into the applications
- The sunspider ports are probably easier to work with

Sunspider: http://svn.webkit.org/repository/webkit/trunk/SunSpider/tests/
- The data sets are (I think) built into the applications
- The most useful benchmarks, in order, appear to be (includes V8 benchmarks):
  - Richards (V8)
  - Deltablue (V8)
  - V8-Crypto (V8)
  - xparb
  - Crypto-AES
  - Crypto-MD5
  - 3d-raytrace (ignoring the actual drawing from JS)

Benchmarking in other languages

* Python: http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Performance

* Ruby: http://groups.google.com/group/ruby-benchmark-suite

Desired Benchmark Features

For command-line apps applications

Run-time
Memory usage
Hardware performance counters (if available), using PAPI
Simulated hardware statistics, using cachegrind
- These should be combined into a single representative number, using some hardware model.
Ability to compare all of these over two runs
Support for other PHP implementations
Benchmark characterisation:
- This is hard to do properly, so best to do it badly and give coarse grained information.

*TODO*

For web-apps

Requests per second
Memory usage
Bottleneck (is it scripting, DB, or network)
Total time for request

Status

Completed Tests

raytracer (28.04.2009)
deltablue (04.05.2009)
crypto (07.06.2009)
whirl & j4p5
crypto-md5 (29.06.2009)
richards (29.06.2009)
Crypto-AES

Still unfinished

Collaboration

CVS

The benchmarks are hosted in CVS: http://cvs.php.net/viewvc.cgi/php-benchmarks/. Checkout with 'cvs -d :pserver:cvsread@cvs.php.net:/repository co php-benchmarks'. More info at http://php.net/anoncvs.php

If you don't have karma, you won't be able to edit the wiki, or commit benchmarks to the suite. Since this takes time, please do not wait for karma to contact us. If you have contributions to make, we'd like hear them, and can make changes to the wiki on your behalf.

To get karma, please fill in the form on http://php.net/cvs-php.php. In the text box, fill in 'Contributing benchmarks'. For 'Type of initial karma' enter 'PHP Group'. Please email Nuno (email below) when you have submitted the form. Once you get karma, please add your username (and other details, if missing), to the contributors section below.

Discussion

Discussion on the QA mailing list (php-qa@lists.php.net). Please include [benchmarks] in the subject.

Contributors

Add your name here if you want to help.

Paul Biggar - paul.biggar [at] gmail.com
Nuno Lopes - nlopess [at] php.net
Ólafur Waage - olafurw [at] gmail.com
Michiaki Tatsubori - mich [at] acm.org
Alexander Hjalmarsson - hjalle [at] sgh.se
Davide Mendolia - idaf1er [at] gmail.com

Table of Contents