rfc:taint
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:taint [2008/07/01 21:56] – wietse | rfc:taint [2012/08/03 07:12] – laruence | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Taint support for PHP ====== | ====== Taint support for PHP ====== | ||
- | * **Author:** Wietse Venema (wietse@porcupine.org) \\ IBM T.J. Watson Research Center \\ Hawthorne, NY, USA | + | * **Author: |
* **Version: | * **Version: | ||
* **Source code:** [[ftp:// | * **Source code:** [[ftp:// | ||
* **Win32 binaries:** [[ftp:// | * **Win32 binaries:** [[ftp:// | ||
+ | * **Mailing list: ** [[http:// | ||
* **Miscellaneous: | * **Miscellaneous: | ||
* **Status:** In the works | * **Status:** In the works | ||
- | | + | |
===== Introduction ===== | ===== Introduction ===== | ||
Line 16: | Line 17: | ||
I need your feedback to make this code complete. I hope to do several quick 1-2 month release cycles in which I collect feedback, fill in missing things, and adjust course until things stabilize. | I need your feedback to make this code complete. I hope to do several quick 1-2 month release cycles in which I collect feedback, fill in missing things, and adjust course until things stabilize. | ||
- | |||
===== A quick example ===== | ===== A quick example ===== | ||
Line 41: | Line 41: | ||
</ | </ | ||
- | At this point I can either leave taint support turned on as a safety net in case someone introduces new mistakes, or I can disable taint support altogether. The run-time performance will not differ measurably, as long as the application does not trigger any alarms. | + | At this point I can either leave taint support turned on as a safety net in case someone introduces new mistakes |
===== Introducing multiple flavors of taint ===== | ===== Introducing multiple flavors of taint ===== | ||
Line 51: | Line 50: | ||
To encourage programmers to use the RIGHT conversion function, I have implemented multiple flavors of taint. Each time data enters a PHP application from the web, from database or from elsewhere, it may be " | To encourage programmers to use the RIGHT conversion function, I have implemented multiple flavors of taint. Each time data enters a PHP application from the web, from database or from elsewhere, it may be " | ||
- | In the case of the buggy example program, data is marked as " | + | In the case of the buggy example program, data is marked as " |
The table below summarizes a number of taint flavors: it shows where a specific flavor may be added to data, where its presence may raise warnings, and how you get rid of the taint flavor. Please ignore the ugly TC_XXX names for now. That's low-level stuff that still needs to be hidden behind a user interface. | The table below summarizes a number of taint flavors: it shows where a specific flavor may be added to data, where its presence may raise warnings, and how you get rid of the taint flavor. Please ignore the ugly TC_XXX names for now. That's low-level stuff that still needs to be hidden behind a user interface. | ||
Line 64: | Line 63: | ||
The TC_SELF flavor is different from the other flavors. Instead of code injection, its purpose is to detect opportunities to hijack control over the PHP application itself. Currently, there is no conversion function that makes all data safe as input for " | The TC_SELF flavor is different from the other flavors. Instead of code injection, its purpose is to detect opportunities to hijack control over the PHP application itself. Currently, there is no conversion function that makes all data safe as input for " | ||
+ | |||
===== What has been implemented sofar ===== | ===== What has been implemented sofar ===== | ||
- | I have built taint support with the following server APIs: cli, cgi; apache1, apache2 and apache2filter plug-in; and with the the following extensions: mysqli, mysql and mbstring. Other server APIs and extensions will follow as time permits. | + | I have implemented |
What about the other extensions? The other extensions will work just fine as long as you leave " | What about the other extensions? The other extensions will work just fine as long as you leave " | ||
Extensions that haven' | Extensions that haven' | ||
+ | |||
===== Using taint support with real PHP applications ===== | ===== Using taint support with real PHP applications ===== | ||
Line 82: | Line 83: | ||
ini_set(" | ini_set(" | ||
ini_set(" | ini_set(" | ||
+ | |||
# Uncomment one of these if you don't want to log to the server' | # Uncomment one of these if you don't want to log to the server' | ||
# ini_set(" | # ini_set(" | ||
# ini_set(" | # ini_set(" | ||
+ | |||
# Temporary workaround to avoid false alarms. Unfortunately, | # Temporary workaround to avoid false alarms. Unfortunately, | ||
# contains a mixed bag of data: some is safe, and some highly dangerous. | # contains a mixed bag of data: some is safe, and some highly dangerous. | ||
Line 116: | Line 119: | ||
This is admittedly imperfect: it would be better to specify what context the data is safe for. A proper user interface for this will have to be developed in a future version of PHP taint support. | This is admittedly imperfect: it would be better to specify what context the data is safe for. A proper user interface for this will have to be developed in a future version of PHP taint support. | ||
- | |||
===== Performance ===== | ===== Performance ===== | ||
- | The performance is quite good. The overhead for "make test" is within 0.5-1.5% when comparing the user-mode CPU time of unmodified PHP against a PHP version with taint support (the number depends on the CPU used and on build options, and there are a few preliminary workarounds in the Windows version that take some extra CPU cycles). I know that a fraction of that time is spent in non-PHP processing, but the bulk is spent in PHP and that is what really matters. If a better " | + | The performance is quite good. The overhead for "make test" is within 0.5-1.5% when comparing the user-mode CPU time of unmodified PHP against a PHP version with taint support (the number depends on CPU details |
- | The " | + | The " |
As long as the application triggers no warnings, it does not make a measurable difference whether taint support is turned on or not. This is due to the way the support is implemented. Without going into detail, the trick is to avoid introducing extra conditional or unconditional jumps in the critical path. | As long as the application triggers no warnings, it does not make a measurable difference whether taint support is turned on or not. This is due to the way the support is implemented. Without going into detail, the trick is to avoid introducing extra conditional or unconditional jumps in the critical path. | ||
- | |||
===== Low-level implementation ===== | ===== Low-level implementation ===== | ||
Taint support is implemented with some of the unused bits in the zval data structure. The zval is the PHP equivalent of a memory cell. Besides a type (string, integer, etc.) and value, each zval has a reference count and a flag that says whether the zval is a reference to yet another zval that contains the actual value. | Taint support is implemented with some of the unused bits in the zval data structure. The zval is the PHP equivalent of a memory cell. Besides a type (string, integer, etc.) and value, each zval has a reference count and a flag that says whether the zval is a reference to yet another zval that contains the actual value. | ||
- | Right now I am using seven bits, but there is room for more: 32-bit UNIX compilers such as GCC add 16 bits of padding to the current zval data structure, and this amount of padding isn't going to be smaller on 64-bit architectures. If I really have to squeeze the taint bits in-between the existing bits, the taint support performance hit goes up. If squeezing is necessary, all PHP code will need to be changed to use official initialization macros, so that expensive shift/mask operations can be avoided as much as possible. | + | Right now I am using eight bits, but there is room for more: 32-bit UNIX compilers such as GCC add 16 bits of padding to the current zval data structure, and this amount of padding isn't going to be smaller on 64-bit architectures; Microsoft Visual Studio 6 also adds 16 bits of padding when it builds PHP on a Win32 platform. If I really have to squeeze the taint bits in-between the existing bits, the taint support performance hit goes up. If squeezing is necessary, all PHP code will need to be changed to use official initialization macros, so that expensive shift/mask operations can be avoided as much as possible. |
The preliminary configuration user interface is rather low-level, somewhat like MS-DOS file permissions :-( This is good enough for testing and debugging the taint support itself, but I would not want to have wires hanging out of the machine like this forever. The raw bits will need to be encapsulated so that applications can work with meaningful names and abstractions. | The preliminary configuration user interface is rather low-level, somewhat like MS-DOS file permissions :-( This is good enough for testing and debugging the taint support itself, but I would not want to have wires hanging out of the machine like this forever. The raw bits will need to be encapsulated so that applications can work with meaningful names and abstractions. |
rfc/taint.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1