rfc:taint

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:taint [2008/07/06 22:53] wietserfc:taint [2017/09/22 13:28] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== Taint support for PHP ====== ====== Taint support for PHP ======
  
-  * **Author:** Wietse Venema (wietse@porcupine.org) \\  IBM T.J. Watson Research Center \\  Hawthorne, NY, USA+  * **Author:** [[http://www.porcupine.org/wietse/|Wietse Venema (wietse@porcupine.org)]] \\  IBM T.J. Watson Research Center \\  Hawthorne, NY, USA
   * **Version:** 20080622   * **Version:** 20080622
   * **Source code:** [[ftp://ftp.porcupine.org/pub/php/php-5.2.5-taint-20080622.tar.gz|tar.gz]]  ([[ftp://ftp.porcupine.org/pub/php/php-5.2.5-taint-20080622.tar.gz.sig|pgp signature]])   * **Source code:** [[ftp://ftp.porcupine.org/pub/php/php-5.2.5-taint-20080622.tar.gz|tar.gz]]  ([[ftp://ftp.porcupine.org/pub/php/php-5.2.5-taint-20080622.tar.gz.sig|pgp signature]])
   * **Win32 binaries:** [[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-win32-installer.msi|installer]]   ([[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-win32-installer.msi.sig|pgp signature]]) | [[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-Win32.zip|zip file]] ([[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-Win32.zip.sig|pgp signature]])   * **Win32 binaries:** [[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-win32-installer.msi|installer]]   ([[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-win32-installer.msi.sig|pgp signature]]) | [[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-Win32.zip|zip file]] ([[ftp://ftp.porcupine.org/pub/php/.win32/php-5.2.5-taint-20080622-Win32.zip.sig|pgp signature]])
 +  * **Mailing list: ** [[http://marc.info/?l=php-internals|PHP internals]]
   * **Miscellaneous: ** [[ftp://ftp.porcupine.org/pub/php/CHANGELOG|Change log]] [[ftp://ftp.porcupine.org/pub/php/LICENSE|License]] [[ftp://ftp.porcupine.org/pub/php/wietse-public-key.pgp|pgp public key]]   * **Miscellaneous: ** [[ftp://ftp.porcupine.org/pub/php/CHANGELOG|Change log]] [[ftp://ftp.porcupine.org/pub/php/LICENSE|License]] [[ftp://ftp.porcupine.org/pub/php/wietse-public-key.pgp|pgp public key]]
-  * **Status:** In the works +  * **Status:** Draft(Inactive) 
-  +  * **Update:** A pecl extension implemented: http://pecl.php.net/package/taint
  
 ===== Introduction ===== ===== Introduction =====
Line 41: Line 42:
  
 At this point I can either leave taint support turned on as a safety net in case someone introduces new mistakes into the PHP script, or I can disable taint support altogether. The run-time performance will not differ measurably, as long as the application does not trigger any alarms. At this point I can either leave taint support turned on as a safety net in case someone introduces new mistakes into the PHP script, or I can disable taint support altogether. The run-time performance will not differ measurably, as long as the application does not trigger any alarms.
- 
 ===== Introducing multiple flavors of taint ===== ===== Introducing multiple flavors of taint =====
  
Line 50: Line 50:
 To encourage programmers to use the RIGHT conversion function, I have implemented multiple flavors of taint. Each time data enters a PHP application from the web, from database or from elsewhere, it may be "tainted" with zero or more taint flavors, so that the PHP engine can warn the programmer and suggest an appropriate conversion function. To encourage programmers to use the RIGHT conversion function, I have implemented multiple flavors of taint. Each time data enters a PHP application from the web, from database or from elsewhere, it may be "tainted" with zero or more taint flavors, so that the PHP engine can warn the programmer and suggest an appropriate conversion function.
  
-In the case of the buggy example program, data is marked as "dangerous for use in HTML" (and other contexts :-) when it is received from the web. The "echo()" primitive detects the presence of this taint flavor in one of its arguments, issues a warning, and suggests using "htmlspecialchars()" or "htmlentities()".+In the case of the buggy example program, data is marked as "dangerous for use in HTML" (and other contexts :-)) when it is received from the web. The "echo()" primitive detects the presence of this taint flavor in one of its arguments, issues a warning, and suggests using "htmlspecialchars()" or "htmlentities()".
  
 The table below summarizes a number of taint flavors: it shows where a specific flavor may be added to data, where its presence may raise warnings, and how you get rid of the taint flavor. Please ignore the ugly TC_XXX names for now. That's low-level stuff that still needs to be hidden behind a user interface. The table below summarizes a number of taint flavors: it shows where a specific flavor may be added to data, where its presence may raise warnings, and how you get rid of the taint flavor. Please ignore the ugly TC_XXX names for now. That's low-level stuff that still needs to be hidden behind a user interface.
Line 63: Line 63:
  
 The TC_SELF flavor is different from the other flavors. Instead of code injection, its purpose is to detect opportunities to hijack control over the PHP application itself. Currently, there is no conversion function that makes all data safe as input for "eval()", "include()" etc. Instead, the application itself is supposed to verify that data is "good" and mark it as such. Until a better user interface exists, this means calling the low-level "untaint()" function directly. The TC_SELF flavor is different from the other flavors. Instead of code injection, its purpose is to detect opportunities to hijack control over the PHP application itself. Currently, there is no conversion function that makes all data safe as input for "eval()", "include()" etc. Instead, the application itself is supposed to verify that data is "good" and mark it as such. Until a better user interface exists, this means calling the low-level "untaint()" function directly.
 +
 ===== What has been implemented sofar ===== ===== What has been implemented sofar =====
  
-I have built taint support with the following server APIs: cli, cgi; apache1, apache2 and apache2filter plug-in; and with the the following extensions: mysqli, mysql and mbstring. Other server APIs and extensions will follow as time permits.+I have implemented taint support with the following server APIs: cli, cgi; apache1, apache2 and apache2filter plug-in; and with the the following extensions: mysqli, mysql and mbstring. Other server APIs and extensions will follow as time permits.
  
 What about the other extensions? The other extensions will work just fine as long as you leave "taint_error_level" at its default setting. They may trigger false warnings when you raise the taint error level, because they don't know how to properly initialize certain bits that taint support relies on. This problem should not exist, but unfortunately there is a lot of PHP source code that does not use standard macros when initializing PHP data structures. What about the other extensions? The other extensions will work just fine as long as you leave "taint_error_level" at its default setting. They may trigger false warnings when you raise the taint error level, because they don't know how to properly initialize certain bits that taint support relies on. This problem should not exist, but unfortunately there is a lot of PHP source code that does not use standard macros when initializing PHP data structures.
  
 Extensions that haven't been updated with taint support will ignore taint information in their inputs, and will therefore not propagate taint information from their inputs to their outputs. Extensions that haven't been updated with taint support will ignore taint information in their inputs, and will therefore not propagate taint information from their inputs to their outputs.
 +
 ===== Using taint support with real PHP applications ===== ===== Using taint support with real PHP applications =====
  
Line 117: Line 119:
  
 This is admittedly imperfect: it would be better to specify what context the data is safe for. A proper user interface for this will have to be developed in a future version of PHP taint support. This is admittedly imperfect: it would be better to specify what context the data is safe for. A proper user interface for this will have to be developed in a future version of PHP taint support.
- 
 ===== Performance ===== ===== Performance =====
  
-The performance is quite good. The overhead for "make test" is within 0.5-1.5% when comparing the user-mode CPU time of unmodified PHP against a PHP version with taint support (the number depends on the CPU used and on build options, and there are a few preliminary workarounds in the Windows version that take some extra CPU cycles). I know that a fraction of that time is spent in non-PHP processing, but the bulk is spent in PHP and that is what really matters. If a better "macro" benchmark exists then I am of course interested.+The performance is quite good. The overhead for "make test" is within 0.5-1.5% when comparing the user-mode CPU time of unmodified PHP against a PHP version with taint support (the number depends on CPU details and on PHP build options, and there are a few preliminary workarounds in the Windows version that take some extra CPU cycles). I know that a fraction of that time is spent in non-PHP processing, but the bulk is spent in PHP and that is what really matters. If a better "macro" benchmark exists then I am of course interested.
  
-The "bench.php" script that comes with PHP source is even less representative of applications: it is a loop-intensive affair that doesn't do any input or output. Nevertheless, it suffers only a modest overhead of 2%. This is good enough for a start; I can try to squeeze out more CPU cycles later if necessary.+The "bench.php" script that comes with PHP source is even less representative of real applications: it is a loop-intensive affair that doesn't do any input or output. Nevertheless, it suffers only a modest overhead of 2%. This is good enough for a start; I can try to squeeze out more CPU cycles later if necessary.
  
 As long as the application triggers no warnings, it does not make a measurable difference whether taint support is turned on or not. This is due to the way the support is implemented. Without going into detail, the trick is to avoid introducing extra conditional or unconditional jumps in the critical path. As long as the application triggers no warnings, it does not make a measurable difference whether taint support is turned on or not. This is due to the way the support is implemented. Without going into detail, the trick is to avoid introducing extra conditional or unconditional jumps in the critical path.
- 
 ===== Low-level implementation ===== ===== Low-level implementation =====
  
 Taint support is implemented with some of the unused bits in the zval data structure. The zval is the PHP equivalent of a memory cell. Besides a type (string, integer, etc.) and value, each zval has a reference count and a flag that says whether the zval is a reference to yet another zval that contains the actual value. Taint support is implemented with some of the unused bits in the zval data structure. The zval is the PHP equivalent of a memory cell. Besides a type (string, integer, etc.) and value, each zval has a reference count and a flag that says whether the zval is a reference to yet another zval that contains the actual value.
  
-Right now I am using seven bits, but there is room for more: 32-bit UNIX compilers such as GCC add 16 bits of padding to the current zval data structure, and this amount of padding isn't going to be smaller on 64-bit architectures. If I really have to squeeze the taint bits in-between the existing bits, the taint support performance hit goes up. If squeezing is necessary, all PHP code will need to be changed to use official initialization macros, so that expensive shift/mask operations can be avoided as much as possible.+Right now I am using eight bits, but there is room for more: 32-bit UNIX compilers such as GCC add 16 bits of padding to the current zval data structure, and this amount of padding isn't going to be smaller on 64-bit architectures; Microsoft Visual Studio 6 also adds 16 bits of padding when it builds PHP on a Win32 platform. If I really have to squeeze the taint bits in-between the existing bits, the taint support performance hit goes up. If squeezing is necessary, all PHP code will need to be changed to use official initialization macros, so that expensive shift/mask operations can be avoided as much as possible.
  
 The preliminary configuration user interface is rather low-level, somewhat like MS-DOS file permissions :-( This is good enough for testing and debugging the taint support itself, but I would not want to have wires hanging out of the machine like this forever. The raw bits will need to be encapsulated so that applications can work with meaningful names and abstractions. The preliminary configuration user interface is rather low-level, somewhat like MS-DOS file permissions :-( This is good enough for testing and debugging the taint support itself, but I would not want to have wires hanging out of the machine like this forever. The raw bits will need to be encapsulated so that applications can work with meaningful names and abstractions.
rfc/taint.1215384824.txt.gz · Last modified: 2017/09/22 13:28 (external edit)