gsoc:2009:ideas

This is an old revision of the document!


Ideas for the Google Summer of Code 2009

Here you'll find a couple of ideas for Google Summer of Code projects. This list is not exhaustive and you may propose any “crazy” idea you may have.

Before you submit your proposal, you are encouraged to contact the possible mentors for the project you are applying. If the project hasn't any mentor assigned or if you are submiting an off-list project, please contact one of our mailing lists to discuss the proposal before submiting it.

Priority will be given to proposals that are directly related to the PHP Project, this includes PECL. PEAR has it's own ideas page over here. If we have any spare slots then we will consider non PHP Project proposals.

If your project is to be written in PHP, please make sure you read the PEAR Coding Standards when applying.

If you are applying for a project in the PHP code itself (in C), you may find useful the PHP hackers guide, which also includes our C coding standards (TBD).

Your proposal should match our Ideas Template, if you are a student and submitting an idea of your own then you should also include:

  • Name and e-mail
  • Availability: How many hours per week can you spend working on this? What other obligations do you have this summer?
  • Bio: Who are you? What makes you the best person to work on this project?

Random unsorted ideas

Feel free to add your own ideas..

  • Unicode work for PHP 6
  • Continue the optimizer work from last year
  • Testbox integration (automatic build+test run system)?
  • ANTLR3 LL(*) Parser for PECL, probably benefiting from ANTLRWorks integration as well
  • ...

Automatic Code Checker

Possible mentor: Nuno Lopes

The PHP API has a couple of functions that are error prone and may easily cause segfaults in PHP, especially on less used platforms. The list of such functions include zend_parse_parameters*(), zend_error() and a few others. Our current check script is made in PHP and is regex based. It is available in CVS. This script is difficult to maintain and generates way too many false-positives. The work would involve creating a LLVM clang analysis tool to perform some data-flow static analysis and output error messages for the problems found. A sample output of the script mentioned is available at: http://gcov.php.net. The work can be based on previous efforts.

Zend Bytecode to LLVM Bitcode Converter

Possible mentor: Nuno Lopes

Make a tool to convert Zend bytecode into LLVM bitcode. Some work has alreay been done last year. A basic JIT engine already works. Features missing include: stabilization, code sharing between Apache/FastCGI processes, further optimizations, generate self-executable applications (without the PHP sources at all), etc.. (VMKit might be a good source for inspiration)

Integrated Code Coverage of C and PHP Code

Possible mentor: Sebastian Bergmann

Implement a parser for gcov data files (#1, #2) in PHP. This would be helpful for developers that write both C-level and PHP-level code for their PHP applications and are interested in an integrated code coverage report.

run-tests.php improvements

Possible mentors: Zoe Slattery Stefan Priebsch

The current version of run-tests.php is a PHP 4 scripts which has grown over the years. We currently have about 8000 tests in PHP and running them sequentially using the existing script is beginning to take an unacceptably long time, the problem will only get worse as we add more tests. We prototyped parallel test execution a while ago and found that it could be done but it was impossibly difficult to do using the existing run-tests.php script. There are other improvements that we'd like to add to run-tests (for example XML output) which we thought would also be easier if the existing run-tests.php script was re-engineered. Over the past few months we have been working on this and have a prototype OO version of run-tests that is currently able to run most of the tests in ex/standard/tests. The code is designed so that adding parallel running will be easy, we'd like to propose that a student takes on (at least) this part of the work - it's interesting and quite difficult PHP coding. A good student could extend beyond this and implement other new features (XML, reporting).

Benchmark creation

Possible mentors: Nuno Lopes, Paul Biggar

Work on replacing the current bench.php benchmarking script with something better. Ideas and discussion in: RFC: Better benchmarks for PHP

PHP/PECL Build Bot

Possible mentors: Elizabeth M Smith

This would be a two part project. Part one would involve a web interface to allow developers to choose extensions (either static or shared) and options for test builds of PHP with error checking. The front end should spawn configure lines (for Windows or Linux) and queue them.

The back end should attempt to build the resulting configurations, push any binaries/log files to a server and email the requester with the location/status of the finished build product.

PHP-GTK PhD migration

PHP-GTK's docs should be moved to the new PhD rendering system, but it is a fairly large job.

Would involve

  1. migrating the current documentation to a docbook 5 format
  2. writing a PhD compatible theme to generate documentation
  3. writing a reflection based updater to keep docs up to date with code (parsing signals, methods, properties and updating the xml)
  4. getting autogeneration set up so PHP-GTK docs are generated on a regular basis

PHP-GTK Code Completion

Possible mentors: Elizabeth M Smith

PHP-GTK does not implement all functions for the latest versions of GTK+. This project primarily involves writing overrides for functions whose implementation is not automatically created by the generator. At the end of the project, all functions defined up to the latest version of GTK+ will be expected to work. This will include not only the GTK symbols but the latest version of ATK, Pango, and Gdk.

CGI/FastCGI SAPI Improvement

Possible mentors: Dmitry Stogov

php-cgi is not useful in production environment without additional “crutches” (e.g. spawn-fcgi from lighttpd distribution or php-fpm patch). This project assumes integration of such “crutches” and extending php-cgi to support for different protocols.

  • daemonization (detach, pid file creation, setup environment variables, setuid/setgid/chroot)
  • graceful restart
  • separate and improve transport layer to allow support for different protocols
  • support for SCGI protocol
  • support for subset of HTTP protocol
  • ...

PhD (PHP based Docbook rendered) improvements

Possible mentors: Hannes Magnusson

phd is the tool that renders the DocBook based documentation for both the PHP Manual and PEAR Manual. It uses XMLReader to read the XML, so it has many obvious drawbacks but XMLReader is also the reason why it's so blazing fast.

Today PhD is PHP.net centric, meaning it contains several “rules” and “workarounds” that are only applicable to PHP.net. The project goals:

  • Centers around transforming PhD into a more generic application, in order to help make it useful for other (non-php.net) projects.
  • Also create “PhD-Setup”, as a replacement for the various configure.php files phpdoc and peardoc use today. PhD-Setup should also be generic enough to be useful for other projects.
  • Other goals include closing PhD related bugs, and adding additional formats/themes.

For questions and thoughts please join the #php.doc IRC channel on EFnet and/or write the PHP Documentation list at phpdoc@lists.php.net

Xdebug: Support for Path Coverage

Possible mentors: Derick Rethans, Sebastian Bergmann

Xdebug is a tool for debugging, analyzing and profiling PHP applications. One of its features is analyzing which code has been executed in a function. At the moment this only happens with a line-based resolution. This project is about extending this to analyze which code *paths* are being covered in functions and methods. This project idea requires C-skills, and you will get very intimate with the PHP internals. Feel free to drop by in either #php.pecl or #xdebug on EFnet (and look for Derick) if you have any questions.

Xdebug: Remote Debugging Support for Watch Expressions

Possible mentors: Derick Rethans

Xdebug is a tool for debugging, analyzing and profiling PHP applications. As part of its remote debugging features it allows you to see variables contents change when they're modified in the code. What is currently not supported is setting breakpoints on variables so that the debugging process interrupts when one of the “watched” variables are changed. This project idea is for implementing this specific feature. Implementing this would requires C-skills, and you will get very intimate with the PHP internals. Feel free to drop by in either #php.pecl or #xdebug on EFnet (and look for Derick) if you have any questions.

PHP-CMake

Possible mentors: Pierre A. Joye, Alejandro Leiva

Cmake is a cross platform make system that would generate native makefiles for developers and has a much simpler syntax to that of m4. Kitware, the company behind cmake is helping us to migrate and to improve cmake to fit our needs.

Mainly this taks for GSoC should be:

  • Improve and refactor the automatic converter from config.m4|w32 to PHP-Cmake CMakeLists, supporting core extensions and PECL extensions.
  • phpize support.
  • Connect CTest to php testing system.
  • Implement CPack solution for PHP binary packaging.

For questions and thoughts please join the #php.cmake IRC channel on EFnet also you can check the wiki page of php-cmake.

Bug Tracker Improvements

Possible mentors: Philip Olson

PHP has a bug tracker, and it needs improvements. Possible tasks include:

  • Determine missing features, and prioritize them
  • Go through current bugsweb bugs, discuss and prioritize them
  • Fix bugs
  • Implement features
  • Update code to work without magic quotes, register_globals and other nonsense

Resources of interest:

Online editor for the PHP Manual

Possible mentors: Yannick Torres, Philip Olson

Work has started on an online editor for the PHP Manual. The student would help get this tool up and running for live use at php.net. The tool performs the following actions:

  • Allows SVN users to make and commit changes
  • Allow anybody to create patches, which are sent to a patch queue for developers review
  • Works with all translations

Resources of interest:

New Mirror Management System

Possible Mentor: Daniel P. Brown

Our existing management system for the network of official mirrors worldwide has a few issues and areas for improvement. The present system is accessible via a browser-based web interface (using a php.net CVS account, and hosted on master.php.net), and provides the following services:

  • Addition, deletion, and modification of DNS for official mirrors of the php.net site as a whole
  • Automated checking of all mirrors, marking mirrors as “disabled” when issues are detected
  • Listing of all mirrors presently configured with the system (active and otherwise)
  • Manual disabling of mirrors
  • Generic status display of the “health” of the mirror (“OK” or flagged)
  • Ability to edit the mirror's display information within our list on php.net
  • Automated mailing to the mirror's maintainer when an issue is detected
  • Automated mailing of the status of all mirrors on a weekly basis

Over time, some of the php.net mirror admins have discussed the need and/or desire to add new - or improve existing - functionality in this system. This includes, but is not limited to:

  • Improved mirror status checking
  • The ability to keep logs of notes regarding the mirror over time, displaying the most recent on the summary page
  • Simple graphing and statistics for uptime and PING response time
  • A geo-IP lookup to ensure that the mirror IP address matches the physical region it serves
  • An online “waiting list” application form (many potential donors currently contact the mailing list or Dan Brown directly)
  • An online form to collect data from mirror maintainer applicants
  • A form-to-mail interface to email individual maintainers, the mailing list, or all maintainers directly (for announcements)
  • Number of interested donors on the “waiting list” under each country's heading
  • Automated quarterly reminders to folks on the “waiting list” to update their status if they are no longer willing/able (?)
  • System specifications on each individual mirror (?)
  • A scoring system based on the system specs of the mirrors (?)
  • Load-balancing based upon the score of the mirror when redirecting to a local mirror in the region (?)
  • Automated daily reminder emails to mirror maintainer when mirror is out-of-sync > 24 hours (still auto-disabled) (?)
  • .... etc.

Close and evaluate PHP Bugs

Possible mentors: Philip Olson

This idea is simple, as it involves scouring the PHP bugs database and fixing bugs. The procedure:

  • Search for all open PHP bugs here
  • Find an open bug that looks appealing
  • Evaluate the bug
  • Propose a patch for the bug to the maintainer and/or internals list
  • Commit the patch
  • Repeat

There are several people and maintainers who would be involved with this idea, such as Tony, Jani, and Johannes. Priority should be given to PHP 5.3 bugs.

Abstract Extension API and Dependency Interface

Possible mentor: Brian Shire, Andrei Zmievski

Currently, PHP extensions that have dependencies on other extensions use compile-time configure checks to verify availability. This has several problems:

  • Any dependency that is not available will trigger undefined symbol errors, likely at run-time, see: http://marc.info/?l=php-internals&m=123456185115526&w=2.
  • Load order is important, as loading the dependency after the dependent extension also triggers symbol errors.
  • Upgrading an extension can easily break binary compatibility creating undefined behavior.
  • The errors given to the end-user are not always obvious or easy for them to debug.
  • On systems where symbols are resolved lazily, the error will not manifest until the code path is executed. This makes debugging difficult, and unexpected.

Deliverables

  • Develop functionality for PHP that will allow extensions to register a set of functions as a versioned API.
  • Other extensions should be able to fetch this API, or handle failures appropriately for their application such as disabling features or generating an error. This includes dependencies being unavailable, or not having the expected version. The calling extension should be able to differentiate between these conditions.
  • The interface should support extensions being loaded in any order.
  • Extensions should be able to support multiple API versions at the same time for backwards compatibility.

Initial Brainstorming and Design

  • The API functions will be passed via a extension specific structure containing function pointers. This requires that extenions share a common prototype for via a C header file. Getting the correct prototype for the correct API version will of course be critical.
  • During module initialization the extension will provide a set of API structures that it supports that contain the function pointers. This can be called multiple times with different version numbers.
  • The extension can also request an API structure for a specific extension and API version at module initialization. The problem of extension load order needs to be solved here, as returning the API structure will likely need to be delayed until after module initialization but not before request initialization to avoid incurring this cost on every request.
  • It may be useful if this interface can be used by PHP to expose callback hooks or interfaces to extensions that may change in the future or need to be abstracted for extension use.
  • Having the ability to list dependencies of extensions, and exposed API's could be useful for informational purposes and documentation.
/* expose API to other extensions 
 * we'll be exposing a fetch() function and a store() function
 * from our extension.  This could be any calls we want to expose.
 * extension_api_t is a custom structure for this extension with
 * function pointers defining this extension's API. 
 */
struct extension_api_t api;
api.version = 1;
api.fetch = my_fetch;
api.store = my_store;
php_register_api(&api)
/* fetch external extension API */
int rval;
int version = 2;
struct extension_api_t *api;
 
rval = php_get_api('myextension', version, &api);
 
if (rval == PHP_EXT_UNAVAIL) {
  zend_error(E_WARNING, "myextension is not loaded or available, disabling feature");
  return NULL;
} else if (rval == PHP_EXT_NOVERSION) {
  zend_error(E_WARNING, "myextension version %d is required, disabling feature", version);
  return NULL;
} else {
  return api->fetch(key);
}

Prototyping Removal of the Zend API

Possible mentor: Paul Biggar, Other interested mentors please add your names

Currently, the structure of PHP extensions requires a very tight coupling with the Zend Engine. This restricts the reimplementation of the Zend engine, and prevents PHP from becoming significantly faster in the long term.

The problem statement, and design for a solution are presented in Remove Zend API.

Deliverables

As part of the GSOC, the student should complete the first two goals in Remove Zend API to a high standard, and the third to a reasonably advanced prototype. The goal of converting the entire set of extensions is, of course, not part of the project. To be considered a success, a good prototyping of 5 or 6 extensions should be achieved.

As well as the code, a short report should be produced, discussing the viability of the approach for the entire set of standard extensions. It should describe what challenges remain to be solved to make it possible, and whether or not the student finds this realistic. This is a very important part of the project, as it will not otherwise be easy to gauge the potential success of this approach over the entire PHP project. This report is not the same as that which is required by Google as part of the GSOC program.

gsoc/2009/ideas.1260619367.txt.gz · Last modified: 2017/09/22 13:28 (external edit)