Ideas for the Google Summer of Code 2009
Here you'll find a couple of ideas for Google Summer of Code projects. This list is not exhaustive and you may propose any “crazy” idea you may have.
Before you submit your proposal, you are encouraged to contact the possible mentors for the project you are applying. If the project hasn't any mentor assigned or if you are submiting an off-list project, please contact one of our mailing lists to discuss the proposal before submiting it.
Priority will be given to proposals that are directly related to the PHP Project, this includes PECL. PEAR has it's own ideas page over here. If we have any spare slots then we will consider non PHP Project proposals.
If your project is to be written in PHP, please make sure you read the PEAR Coding Standards when applying.
If you are applying for a project in the PHP code itself (in C), you may find useful the PHP hackers guide, which also includes our C coding standards (TBD).
Your proposal should match our Ideas Template, if you are a student and submitting an idea of your own then you should also include:
- Name and e-mail
- Availability: How many hours per week can you spend working on this? What other obligations do you have this summer?
- Bio: Who are you? What makes you the best person to work on this project?
Random unsorted ideas
Feel free to add your own ideas..
- Unicode work for PHP 6
- Continue the optimizer work from last year
- Testbox integration (automatic build+test run system)?
- ANTLR3 LL(*) Parser for PECL, probably benefiting from ANTLRWorks integration as well
- ...
Automatic Code Checker
Possible mentor: Nuno Lopes
The PHP API has a couple of functions that are error prone and may easily cause segfaults in PHP, especially on less used platforms. The list of such functions include zend_parse_parameters*(), zend_error() and a few others. Our current check script is made in PHP and is regex based. It is available in CVS. This script is difficult to maintain and generates way too many false-positives. The work would involve creating a LLVM clang analysis tool to perform some data-flow static analysis and output error messages for the problems found. A sample output of the script mentioned is available at: http://gcov.php.net. The work can be based on previous efforts.
Zend Bytecode to LLVM Bitcode Converter
Possible mentor: Nuno Lopes
Make a tool to convert Zend bytecode into LLVM bitcode. Some work has alreay been done last year. A basic JIT engine already works. Features missing include: stabilization, code sharing between Apache/FastCGI processes, further optimizations, generate self-executable applications (without the PHP sources at all), etc.. (VMKit might be a good source for inspiration)
Integrated Code Coverage of C and PHP Code
Possible mentor: Sebastian Bergmann
Implement a parser for gcov data files (#1, #2) in PHP. This would be helpful for developers that write both C-level and PHP-level code for their PHP applications and are interested in an integrated code coverage report.
run-tests.php improvements
Possible mentors: Zoe Slattery Stefan Priebsch
The current version of run-tests.php is a PHP 4 scripts which has grown over the years. We currently have about 8000 tests in PHP and running them sequentially using the existing script is beginning to take an unacceptably long time, the problem will only get worse as we add more tests. We prototyped parallel test execution a while ago and found that it could be done but it was impossibly difficult to do using the existing run-tests.php script. There are other improvements that we'd like to add to run-tests (for example XML output) which we thought would also be easier if the existing run-tests.php script was re-engineered. Over the past few months we have been working on this and have a prototype OO version of run-tests that is currently able to run most of the tests in ex/standard/tests. The code is designed so that adding parallel running will be easy, we'd like to propose that a student takes on (at least) this part of the work - it's interesting and quite difficult PHP coding. A good student could extend beyond this and implement other new features (XML, reporting).
Benchmark creation
Possible mentors: Nuno Lopes, Paul Biggar
Work on replacing the current bench.php benchmarking script with something better. Ideas and discussion in: RFC: Better benchmarks for PHP
PHP/PECL Build Bot
Possible mentors: Elizabeth M Smith
This would be a two part project. Part one would involve a web interface to allow developers to choose extensions (either static or shared) and options for test builds of PHP with error checking. The front end should spawn configure lines (for Windows or Linux) and queue them.
The back end should attempt to build the resulting configurations, push any binaries/log files to a server and email the requester with the location/status of the finished build product.
PHP-GTK Related
PHP-GTK PhD migration
PHP-GTK's docs should be moved to the new PhD rendering system, but it is a fairly large job.
Would involve
- migrating the current documentation to a docbook 5 format
- writing a PhD compatible theme to generate documentation
- writing a reflection based updater to keep docs up to date with code (parsing signals, methods, properties and updating the xml)
- getting autogeneration set up so PHP-GTK docs are generated on a regular basis
PHP-GTK Code Completion
Possible mentors: Elizabeth M Smith
PHP-GTK does not implement all functions for the latest versions of GTK+. This project primarily involves writing overrides for functions whose implementation is not automatically created by the generator. At the end of the project, all functions defined up to the latest version of GTK+ will be expected to work. This will include not only the GTK symbols but the latest version of ATK, Pango, and Gdk.
CGI/FastCGI SAPI Improvement
Possible mentors: Dmitry Stogov
php-cgi is not useful in production environment without additional “crutches” (e.g. spawn-fcgi from lighttpd distribution or php-fpm patch). This project assumes integration of such “crutches” and extending php-cgi to support for different protocols.
- daemonization (detach, pid file creation, setup environment variables, setuid/setgid/chroot)
- graceful restart
- separate and improve transport layer to allow support for different protocols
- support for SCGI protocol
- support for subset of HTTP protocol
- ...
PhD (PHP based Docbook rendered) improvements
Possible mentors: Hannes Magnusson
phd is the tool that renders the DocBook based documentation for both the PHP Manual and PEAR Manual. It uses XMLReader to read the XML, so it has many obvious drawbacks but XMLReader is also the reason why it's so blazing fast.
Today PhD is PHP.net centric, meaning it contains several “rules” and “workarounds” that are only applicable to PHP.net. The project goals:
- Centers around transforming PhD into a more generic application, in order to help make it useful for other (non-php.net) projects.
- Also create “PhD-Setup”, as a replacement for the various configure.php files phpdoc and peardoc use today. PhD-Setup should also be generic enough to be useful for other projects.
- Other goals include closing PhD related bugs, and adding additional formats/themes.
For questions and thoughts please join the #php.doc IRC channel on EFnet and/or write the PHP Documentation list at phpdoc@lists.php.net
Xdebug: Support for Path Coverage
Possible mentors: Derick Rethans, Sebastian Bergmann
Xdebug is a tool for debugging, analyzing and profiling PHP applications. One of its features is analyzing which code has been executed in a function. At the moment this only happens with a line-based resolution. This project is about extending this to analyze which code *paths* are being covered in functions and methods. This project idea requires C-skills, and you will get very intimate with the PHP internals. Feel free to drop by in either #php.pecl or #xdebug on Freenode (and look for Derick) if you have any questions.
Xdebug: Remote Debugging Support for Watch Expressions
Possible mentors: Derick Rethans
Xdebug is a tool for debugging, analyzing and profiling PHP applications. As part of its remote debugging features it allows you to see variables contents change when they're modified in the code. What is currently not supported is setting breakpoints on variables so that the debugging process interrupts when one of the “watched” variables are changed. This project idea is for implementing this specific feature. Implementing this would requires C-skills, and you will get very intimate with the PHP internals. Feel free to drop by in either #php.pecl or #xdebug on EFnet (and look for Derick) if you have any questions.
PHP-CMake
Possible mentors: Pierre A. Joye, Alejandro Leiva
Cmake is a cross platform make system that would generate native makefiles for developers and has a much simpler syntax to that of m4. Kitware, the company behind cmake is helping us to migrate and to improve cmake to fit our needs.
Mainly this taks for GSoC should be:
- Improve and refactor the automatic converter from config.m4|w32 to PHP-Cmake CMakeLists, supporting core extensions and PECL extensions.
- phpize support.
- Connect CTest to php testing system.
- Implement CPack solution for PHP binary packaging.
For questions and thoughts please join the #php.cmake IRC channel on EFnet also you can check the wiki page of php-cmake.
Bug Tracker Improvements
Possible mentors: Philip Olson
PHP has a bug tracker, and it needs improvements. Possible tasks include:
- Determine missing features, and prioritize them
- Go through current bugsweb bugs, discuss and prioritize them
- Fix bugs
- Implement features
- Update code to work without magic quotes, register_globals and other nonsense
Resources of interest:
- Site: http://bugs.php.net/
Online editor for the PHP Manual
Possible mentors: Yannick Torres, Philip Olson
Work has started on an online editor for the PHP Manual. The student would help get this tool up and running for live use at php.net. The tool performs the following actions:
- Allows SVN users to make and commit changes
- Allow anybody to create patches, which are sent to a patch queue for developers review
- Works with all translations
Resources of interest:
- Videos showing use: http://tmp.cweiske.de/phpeditor-videos/
- The source in SVN: http://svn.php.net/viewvc/web/doc-editor/
Close and evaluate PHP Bugs
Possible mentors: Philip Olson
This idea is simple, as it involves scouring the PHP bugs database and fixing bugs. The procedure:
- Search for all open PHP bugs here
- Find an open bug that looks appealing
- Evaluate the bug
- Propose a patch for the bug to the maintainer and/or internals list
- Commit the patch
- Repeat
There are several people and maintainers who would be involved with this idea, such as Tony, Jani, and Johannes. Priority should be given to PHP 5.3 bugs.
Abstract Extension API and Dependency Interface
Possible mentor: Brian Shire, Andrei Zmievski
Currently, PHP extensions that have dependencies on other extensions use compile-time configure checks to verify availability. This has several problems:
- Any dependency that is not available will trigger undefined symbol errors, likely at run-time, see: http://marc.info/?l=php-internals&m=123456185115526&w=2.
- Load order is important, as loading the dependency after the dependent extension also triggers symbol errors.
- Upgrading an extension can easily break binary compatibility creating undefined behavior.
- The errors given to the end-user are not always obvious or easy for them to debug.
- On systems where symbols are resolved lazily, the error will not manifest until the code path is executed. This makes debugging difficult, and unexpected.
Deliverables
- Develop functionality for PHP that will allow extensions to register a set of functions as a versioned API.
- Other extensions should be able to fetch this API, or handle failures appropriately for their application such as disabling features or generating an error. This includes dependencies being unavailable, or not having the expected version. The calling extension should be able to differentiate between these conditions.
- The interface should support extensions being loaded in any order.
- Extensions should be able to support multiple API versions at the same time for backwards compatibility.
Initial Brainstorming and Design
- The API functions will be passed via a extension specific structure containing function pointers. This requires that extenions share a common prototype for via a C header file. Getting the correct prototype for the correct API version will of course be critical.
- During module initialization the extension will provide a set of API structures that it supports that contain the function pointers. This can be called multiple times with different version numbers.
- The extension can also request an API structure for a specific extension and API version at module initialization. The problem of extension load order needs to be solved here, as returning the API structure will likely need to be delayed until after module initialization but not before request initialization to avoid incurring this cost on every request.
- It may be useful if this interface can be used by PHP to expose callback hooks or interfaces to extensions that may change in the future or need to be abstracted for extension use.
- Having the ability to list dependencies of extensions, and exposed API's could be useful for informational purposes and documentation.
/* expose API to other extensions * we'll be exposing a fetch() function and a store() function * from our extension. This could be any calls we want to expose. * extension_api_t is a custom structure for this extension with * function pointers defining this extension's API. */ struct extension_api_t api; api.version = 1; api.fetch = my_fetch; api.store = my_store; php_register_api(&api)
/* fetch external extension API */ int rval; int version = 2; struct extension_api_t *api; rval = php_get_api('myextension', version, &api); if (rval == PHP_EXT_UNAVAIL) { zend_error(E_WARNING, "myextension is not loaded or available, disabling feature"); return NULL; } else if (rval == PHP_EXT_NOVERSION) { zend_error(E_WARNING, "myextension version %d is required, disabling feature", version); return NULL; } else { return api->fetch(key); }
Prototyping Removal of the Zend API
Possible mentor: Paul Biggar, Other interested mentors please add your names
Currently, the structure of PHP extensions requires a very tight coupling with the Zend Engine. This restricts the reimplementation of the Zend engine, and prevents PHP from becoming significantly faster in the long term.
The problem statement, and design for a solution are presented in Remove Zend API.
Deliverables
As part of the GSOC, the student should complete the first two goals in Remove Zend API to a high standard, and the third to a reasonably advanced prototype. The goal of converting the entire set of extensions is, of course, not part of the project. To be considered a success, a good prototyping of 5 or 6 extensions should be achieved.
As well as the code, a short report should be produced, discussing the viability of the approach for the entire set of standard extensions. It should describe what challenges remain to be solved to make it possible, and whether or not the student finds this realistic. This is a very important part of the project, as it will not otherwise be easy to gauge the potential success of this approach over the entire PHP project. This report is not the same as that which is required by Google as part of the GSOC program.