rfc:taint

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:taint [2008/07/01 01:01] wietserfc:taint [2017/09/22 13:28] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== Taint support for PHP ====== ====== Taint support for PHP ======
  
-  * **Author:** Wietse Venema (wietse@porcupine.org) \\  IBM T.J. Watson Research Center \\  Hawthorne, NY, USA+  * **Author:** [[http://www.porcupine.org/wietse/|Wietse Venema (wietse@porcupine.org)]] \\  IBM T.J. Watson Research Center \\  Hawthorne, NY, USA
   * **Version:** 20080622   * **Version:** 20080622
   * **Source code:** [[ftp://ftp.porcupine.org/pub/php/php-5.2.5-taint-20080622.tar.gz|tar.gz]]  ([[ftp://ftp.porcupine.org/pub/php/php-5.2.5-taint-20080622.tar.gz.sig|pgp signature]])   * **Source code:** [[ftp://ftp.porcupine.org/pub/php/php-5.2.5-taint-20080622.tar.gz|tar.gz]]  ([[ftp://ftp.porcupine.org/pub/php/php-5.2.5-taint-20080622.tar.gz.sig|pgp signature]])
   * **Win32 binaries:** [[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-win32-installer.msi|installer]]   ([[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-win32-installer.msi.sig|pgp signature]]) | [[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-Win32.zip|zip file]] ([[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-Win32.zip.sig|pgp signature]])   * **Win32 binaries:** [[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-win32-installer.msi|installer]]   ([[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-win32-installer.msi.sig|pgp signature]]) | [[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-Win32.zip|zip file]] ([[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-Win32.zip.sig|pgp signature]])
 +  * **Mailing list: ** [[http://marc.info/?l=php-internals|PHP internals]]
   * **Miscellaneous: ** [[ftp://ftp.porcupine.org/pub/php/CHANGELOG|Change log]] [[ftp://ftp.porcupine.org/pub/php/LICENSE|License]] [[ftp://ftp.porcupine.org/pub/php/wietse-public-key.pgp|pgp public key]]   * **Miscellaneous: ** [[ftp://ftp.porcupine.org/pub/php/CHANGELOG|Change log]] [[ftp://ftp.porcupine.org/pub/php/LICENSE|License]] [[ftp://ftp.porcupine.org/pub/php/wietse-public-key.pgp|pgp public key]]
-  * **Status:** In the works +  * **Status:** Draft(Inactive) 
-  +  * **Update:** A pecl extension implemented: http://pecl.php.net/package/taint
  
 ===== Introduction ===== ===== Introduction =====
Line 16: Line 17:
  
 I need your feedback to make this code complete. I hope to do several quick 1-2 month release cycles in which I collect feedback, fill in missing things, and adjust course until things stabilize. I need your feedback to make this code complete. I hope to do several quick 1-2 month release cycles in which I collect feedback, fill in missing things, and adjust course until things stabilize.
- 
 ===== A quick example ===== ===== A quick example =====
  
Line 41: Line 41:
 </code> </code>
  
-At this point I can either leave taint support turned on as a safety net in case someone introduces new mistakes, or I can disable taint support altogether. The run-time performance will not differ measurably, as long as the application does not trigger any alarms. +At this point I can either leave taint support turned on as a safety net in case someone introduces new mistakes into the PHP script, or I can disable taint support altogether. The run-time performance will not differ measurably, as long as the application does not trigger any alarms.
 ===== Introducing multiple flavors of taint ===== ===== Introducing multiple flavors of taint =====
  
Line 51: Line 50:
 To encourage programmers to use the RIGHT conversion function, I have implemented multiple flavors of taint. Each time data enters a PHP application from the web, from database or from elsewhere, it may be "tainted" with zero or more taint flavors, so that the PHP engine can warn the programmer and suggest an appropriate conversion function. To encourage programmers to use the RIGHT conversion function, I have implemented multiple flavors of taint. Each time data enters a PHP application from the web, from database or from elsewhere, it may be "tainted" with zero or more taint flavors, so that the PHP engine can warn the programmer and suggest an appropriate conversion function.
  
-In the case of the buggy example program, data is marked as "dangerous for use in HTML" (and other contexts :-) when it is received from the web. The "echo()" primitive detects the presence of this taint flavor in one of its arguments, issues a warning, and suggests using "htmlspecialchars()" or "htmlentities()".+In the case of the buggy example program, data is marked as "dangerous for use in HTML" (and other contexts :-)) when it is received from the web. The "echo()" primitive detects the presence of this taint flavor in one of its arguments, issues a warning, and suggests using "htmlspecialchars()" or "htmlentities()".
  
 The table below summarizes a number of taint flavors: it shows where a specific flavor may be added to data, where its presence may raise warnings, and how you get rid of the taint flavor. Please ignore the ugly TC_XXX names for now. That's low-level stuff that still needs to be hidden behind a user interface. The table below summarizes a number of taint flavors: it shows where a specific flavor may be added to data, where its presence may raise warnings, and how you get rid of the taint flavor. Please ignore the ugly TC_XXX names for now. That's low-level stuff that still needs to be hidden behind a user interface.
Line 64: Line 63:
  
 The TC_SELF flavor is different from the other flavors. Instead of code injection, its purpose is to detect opportunities to hijack control over the PHP application itself. Currently, there is no conversion function that makes all data safe as input for "eval()", "include()" etc. Instead, the application itself is supposed to verify that data is "good" and mark it as such. Until a better user interface exists, this means calling the low-level "untaint()" function directly. The TC_SELF flavor is different from the other flavors. Instead of code injection, its purpose is to detect opportunities to hijack control over the PHP application itself. Currently, there is no conversion function that makes all data safe as input for "eval()", "include()" etc. Instead, the application itself is supposed to verify that data is "good" and mark it as such. Until a better user interface exists, this means calling the low-level "untaint()" function directly.
 +
 ===== What has been implemented sofar ===== ===== What has been implemented sofar =====
  
-I have built taint support with the following server APIs: cli, cgi; apache1, apache2 and apache2filter DSO (loadable module); and with the the following extensions: mysqli, mysql and mbstring. Other server APIs and extensions will follow as time permits.+I have implemented taint support with the following server APIs: cli, cgi; apache1, apache2 and apache2filter plug-in; and with the the following extensions: mysqli, mysql and mbstring. Other server APIs and extensions will follow as time permits.
  
 What about the other extensions? The other extensions will work just fine as long as you leave "taint_error_level" at its default setting. They may trigger false warnings when you raise the taint error level, because they don't know how to properly initialize certain bits that taint support relies on. This problem should not exist, but unfortunately there is a lot of PHP source code that does not use standard macros when initializing PHP data structures. What about the other extensions? The other extensions will work just fine as long as you leave "taint_error_level" at its default setting. They may trigger false warnings when you raise the taint error level, because they don't know how to properly initialize certain bits that taint support relies on. This problem should not exist, but unfortunately there is a lot of PHP source code that does not use standard macros when initializing PHP data structures.
Line 74: Line 74:
 ===== Using taint support with real PHP applications ===== ===== Using taint support with real PHP applications =====
  
-To use PHP with taint support, either install the Win32 binaries (linked from the top of this document) or build PHP with taint support from source code (again, linked from the top of this document). +To use PHP with taint support, either install the Win32 binaries or build PHP with taint support from source code (source and binary distributions are linked from the top of this document). For UNIX build instructions see the README.taint or README.taint.html file in the source bundle. Sorry, there are currently no Windows build instructions.
  
-To build from source on a UNIX-like environment: +To experiment with taint support, copy the file "[[http://ftp.porcupine.org/pub/php/taint_ini.php.txt|taint_ini.php]]" (also available in the top-level PHP+taint source directory) to your PHP script directory, edit the file per the instructions below, and "include" it into the PHP script. The file begins like this:
- +
-<code> +
-# Stand-alone CLI and CGI programs +
-$ make distclean +
-$ ./configure --enable-taint --with-mysqli --with-mysql ... +
-$ make +
-# Apache2 module +
-$ make distclean +
-$ ./configure --enable-taint --with-apxs2=/path/to/apxs ... +
-$ make +
-# Apache1 module +
-$ make distclean +
-$ ./configure --enable-taint --with-apxs=/path/to/apxs ... +
-$ make +
-</code> +
- +
-After a successful build, proceed with "make install"+
- +
-To experiment with taint support, copy the file "[[ftp://ftp.porcupine.org/pub/php/taint_ini.php.txt|taint_ini.php]]" (also available in the top-level PHP+taint source directory) to your PHP script directory, edit the file per the instructions below, and "include" it into the PHP script. The file begins like this:+
  
 <code php> <code php>
Line 102: Line 83:
 ini_set("log_errors", true); ini_set("log_errors", true);
 ini_set("display_errors", false); ini_set("display_errors", false);
 +
 # Uncomment one of these if you don't want to log to the server's log. # Uncomment one of these if you don't want to log to the server's log.
 # ini_set("error_log", "syslog"); # ini_set("error_log", "syslog");
 # ini_set("error_log", "/path/to/errorlog"); # ini_set("error_log", "/path/to/errorlog");
 +
 # Temporary workaround to avoid false alarms. Unfortunately, $_SERVER[] # Temporary workaround to avoid false alarms. Unfortunately, $_SERVER[]
 # contains a mixed bag of data: some is safe, and some highly dangerous. # contains a mixed bag of data: some is safe, and some highly dangerous.
Line 116: Line 99:
 Notes: Notes:
  
-  * If you use an error level of E_USER_WARNING, you can use "set_error_handler()" and report taint conflicts in more detail, complete with symbol table and stack trace. For an example, see the file "[[ftp://ftp.porcupine.org/pub/php/taint_trace.php.txt|taint_trace.php]]" (also available in the top-level PHP+taint source directory).+  * If you use an error level of E_USER_WARNING, you can use "set_error_handler()" and report taint conflicts in more detail, complete with symbol table and stack trace. For an example, see the file "[[http://ftp.porcupine.org/pub/php/taint_trace.php.txt|taint_trace.php]]" (also available in the top-level PHP+taint source directory).
   * The "untaint($_SERVER...)" workarounds won't be needed in a future release.   * The "untaint($_SERVER...)" workarounds won't be needed in a future release.
   * If you specify your own error logfile, make sure this file is writable by the server process. You may have to do something ugly like this:   * If you specify your own error logfile, make sure this file is writable by the server process. You may have to do something ugly like this:
Line 136: Line 119:
  
 This is admittedly imperfect: it would be better to specify what context the data is safe for. A proper user interface for this will have to be developed in a future version of PHP taint support. This is admittedly imperfect: it would be better to specify what context the data is safe for. A proper user interface for this will have to be developed in a future version of PHP taint support.
- 
 ===== Performance ===== ===== Performance =====
  
-The performance is quite good. The overhead for "make test" is within 0.5-1.5% when comparing the user-mode CPU time of unmodified PHP against a PHP version with taint support (the number depends on the CPU used and on build options, and there are a few preliminary workarounds in the Windows version that take some extra CPU cycles). I know that a fraction of that time is spent in non-PHP processing, but the bulk is spent in PHP and that is what really matters. If a better "macro" benchmark exists then I am of course interested.+The performance is quite good. The overhead for "make test" is within 0.5-1.5% when comparing the user-mode CPU time of unmodified PHP against a PHP version with taint support (the number depends on CPU details and on PHP build options, and there are a few preliminary workarounds in the Windows version that take some extra CPU cycles). I know that a fraction of that time is spent in non-PHP processing, but the bulk is spent in PHP and that is what really matters. If a better "macro" benchmark exists then I am of course interested.
  
-The "bench.php" script that comes with PHP source is even less representative of applications: it is a loop-intensive affair that doesn't do any input or output. Nevertheless, it suffers only a modest overhead of 2%. This is good enough for a start; I can try to squeeze out more CPU cycles later if necessary.+The "bench.php" script that comes with PHP source is even less representative of real applications: it is a loop-intensive affair that doesn't do any input or output. Nevertheless, it suffers only a modest overhead of 2%. This is good enough for a start; I can try to squeeze out more CPU cycles later if necessary.
  
 As long as the application triggers no warnings, it does not make a measurable difference whether taint support is turned on or not. This is due to the way the support is implemented. Without going into detail, the trick is to avoid introducing extra conditional or unconditional jumps in the critical path. As long as the application triggers no warnings, it does not make a measurable difference whether taint support is turned on or not. This is due to the way the support is implemented. Without going into detail, the trick is to avoid introducing extra conditional or unconditional jumps in the critical path.
- 
 ===== Low-level implementation ===== ===== Low-level implementation =====
  
 Taint support is implemented with some of the unused bits in the zval data structure. The zval is the PHP equivalent of a memory cell. Besides a type (string, integer, etc.) and value, each zval has a reference count and a flag that says whether the zval is a reference to yet another zval that contains the actual value. Taint support is implemented with some of the unused bits in the zval data structure. The zval is the PHP equivalent of a memory cell. Besides a type (string, integer, etc.) and value, each zval has a reference count and a flag that says whether the zval is a reference to yet another zval that contains the actual value.
  
-Right now I am using seven bits, but there is room for more: 32-bit UNIX compilers such as GCC add 16 bits of padding to the current zval data structure, and this amount of padding isn't going to be smaller on 64-bit architectures. If I really have to squeeze the taint bits in-between the existing bits, the taint support performance hit goes up. If squeezing is necessary, all PHP code will need to be changed to use official initialization macros, so that expensive shift/mask operations can be avoided as much as possible.+Right now I am using eight bits, but there is room for more: 32-bit UNIX compilers such as GCC add 16 bits of padding to the current zval data structure, and this amount of padding isn't going to be smaller on 64-bit architectures; Microsoft Visual Studio 6 also adds 16 bits of padding when it builds PHP on a Win32 platform. If I really have to squeeze the taint bits in-between the existing bits, the taint support performance hit goes up. If squeezing is necessary, all PHP code will need to be changed to use official initialization macros, so that expensive shift/mask operations can be avoided as much as possible.
  
 The preliminary configuration user interface is rather low-level, somewhat like MS-DOS file permissions :-( This is good enough for testing and debugging the taint support itself, but I would not want to have wires hanging out of the machine like this forever. The raw bits will need to be encapsulated so that applications can work with meaningful names and abstractions. The preliminary configuration user interface is rather low-level, somewhat like MS-DOS file permissions :-( This is good enough for testing and debugging the taint support itself, but I would not want to have wires hanging out of the machine like this forever. The raw bits will need to be encapsulated so that applications can work with meaningful names and abstractions.
rfc/taint.1214874119.txt.gz · Last modified: 2017/09/22 13:28 (external edit)