rfc:add_validate_functions_to_filter

This is an old revision of the document!


PHP RFC: Add validation functions to filter module

Introduction

Input data validation is the most important security measure in software security.

These recommends input validation by whitelist and accept only valid one.

We have filter module for this purpose, but it has problems

  • Functions are designed to “filter/convert value and accept” basically even with validation filter. (It accepts invalid also)
    • Allow undefined input (empty element in input array) by default.
    • They converts invalid values to NULL/FALSE, URL filter converts inputs to lower case. These behavior makes it difficult to identify input validation error. This is especially a problem with array value validation. e.g. “$orig_array == $check_array” cannot be done.
    • Input validation must pass under normal condition. Input data validation error should result in exception error by default, but they are not.
  • String validation filter is missing even if string is the most dangerous input.
  • Multiple validation filters are not allowed. There are cases that we would like to use multiple filters, especially for strings. e.g. Check string length and encoding, then perform regex check.

Even if filter module is for better security, there are users misuse the module and result in bad consequences.

This RFC encourage users to proper secure coding by adding more suitable functions and filter for input validations.

Secure coding basics

A fundamental idea of secure coding is input and output control. Proposed new functions are supposed to use for input data validations, not for input error check in business logic.

Input data validation is better to think as input data assertion which should never fail under normal circumstances. Nature of input validation is differ from wrong input data handling from users which would happens normal conditions. User input mistakes, logically inconsistent data, e.g. date is past date for reservations, should not handled by input data validation part in general, but in business logic.

NOTE: Input data validation is runtime assertion should never fail under normal circumstances. It should not handle user input mistakes nor logical inconsistencies.

WARNING: Input and output handling is independent. Output code is responsible to make sure output is safe for external computer/software. Output data should be safe regardless of input validation. i.e. Programmer must escape/use secure API/validate (or Make sure 120% safety of the data) all output data always.

What programmers should do for input validation(assertion) is to accept only valid inputs. Followings are blacklist of inputs, but programmers must think of whitelist (what's to accept, reject anything else), NOT blacklist, when they write input validation code.

  • Broken char encoding (Accept only valid encoding)
  • NUL, etc control chars in string. (Accept only chars allowed)
  • Too long or too short string. e.g. JS validated values and values set by server programs like <select>/<input type=radio>/etc, 100 chars for username, 1000 chars for password, empty ID for a database record, etc. (Accept only strings within range)
  • Too large or too small numerics. i.e. Int/float/bool value (Accept only numeric within range)
  • Too many or too few inputs. (Accept only input that has expected number of inputs)
  • Broken number string for a database record ID. (Accept only valid string format)
  • Broken flags. i.e. Bool value (Accept only valid value for bool)
  • Newline chars in <input>, hash value, etc. (Accept only valid string format/chars)
  • Broken string/date format. e.g. JS validated phone number, list items such as country names, date string, etc. (Accept only valid string format)
  • and so on.

Not all of them can be validated at input validation. How/what input could be validated is depended on input source spec. For example, if you do client side validation in your system, you can validate strings strictly. e.g. Date string. If you don't do client side validation at all and using plain <input> for date, your validation code cannot do much. However, a string over 100 chars, string contains control char(s) or broken char encoding for date is good enough to be rejected as a invalid input.

Dividing input data validation and user input mistake handling in business logic makes software simpler and easier to maintain. Input data format is more stable than business logic by nature. e.g. Object interface is more stable than object implementation. Simplicity and maintainability is important for security also.

SUMMARY: Input data validation should accept only valid and possible inputs. If not, reject it and terminate program. There is no point to keep running program with invalid input data that cannot work correctly. Logic should take care of the rest that input validation cannot check.

The most important input validation is application level validation, but input validation is not limited to it.

Please refer to mentioned secure cording practices, Design by Contract(DbC) for more details. DbC requires proper runtime input validations. Proposed validation functions can be used for this purpose.

Proposal

Followings are filter module improvement proposals.

Add validation functions

  • Add filter_require_var_array()/filter_require_input_array()/filter_require_var()/filter_require_input()
array filter_require_var_array ( array $data , mixed $definition [, bool $add_empty = false ] )
mixed filter_require_var ( mixed $variable , int $filter [, mixed $options ] )
array filter_require_input_array ( int $type , mixed $definition [, bool $add_empty = false ] )
mixed filter_require_input ( int $type , string $variable_name , int $filter [, mixed $options ] )

They are almost the same as filter_var/input*() functions. Key differences compared to other filter_var/input*() functions are:

  • Raise FilterValidateException when they detect invalid input.
  • Requires to define filter. (Default must be set by user)
  • Conservative default. Empty element is not added by default. They do not trim spaces in int/float/bool like input data.

NOTE: Main motivation of adding these functions is “filter_var_array()/filter_input_array() is not suitable for strict input validation”. See Discussion section.

  • Add filter_check_definition() - Check definition array for filter_require_*_array()/filter_*_array()
bool filter_check_definition (array $definition_of_array_value_filter_and_validation)

Filter definition error is silently ignored for performance reason. Definition error could be fatal bug. This function provides check feature finds typo, format error.

Limitations and Notes:

  • filter_check_definition() only checks format, not semantics. i.e. It does not check if options/flags are suitable for filter.
  • Callback filter can be used for validations, but it is users' responsibility to raise FilterValidateException when there is validation error.
  • filter_require_*() functions share filter_var/input*() function's validation filter. Therefore,
    • filter_require_*() functions do not keep input data type. Data type is changed according filter used. i.e. INT/FLOAT/BOOL filters convert data type.
    • FILTER_VALIDATE_INT/FILTER_VALIDATE_FLOAT/FILTER_VALIDATE_BOOLEAN validation DO NOT trim spaces and converts to int/float/bool type. Spaces raise exception.
    • FILTER_VALIDATE_INT validation converts base 10, base 8(FILTER_FLAG_ALLOW_OCTAL), base 16(FILTER_FLAG_ALLOW_HEX) integer values to int type. In addition, it detects overflow, so be careful when your program must run nicely on both 32 and 64 bit architecture. NOTE: One must not use FILTER_VALIDATE_INT for database record ID validation. Use string validation filter and FILTER_VALIDATE_STRING_NUM.
    • FILTER_VALIDATE_BOOLEAN validation converts 1/true/yes/on(case insensitive) to TRUE, 0/false/no/off(case insensitive) to FALSE.
    • FILTER_VALIDATE_BOOLEAN does NOT allow empty to FALSE conversion. Use FILTER_FLAG_BOOL_ALLOW_EMPTY to achieve filter_var/input*() like behavior.
    • Data type conversions is good for 'declare(strict_types=1)' ZendEngine switch, so it is retained.
  • Since Exception terminates execution where it is raised, return value from filter_require_*() function is not usable when validation exception is raised. See the example code in “Allow multiple filters for an input” section.

Allow multiple filters for an input

Example is easier to understand. New filter module allows multiple filters for both validation/sanitize filters.

    <?php
    // Following initialization is to illustrate exception handling.
    $myinput = array(
        'some' => 'initialization',
        'or' => 'could be return value from previous validation',
    );
 
    $date_spec = 
       array(
                // New filter module allows multiple filters and options as follows.
                // Array elements are evaluated in order. Non array spec is evaluated last.
                // Older implementation ignores this kind of spec silently.
                array( // This is evaluated first.
                        'filter'    => FILTER_VALIDATE_STRING,
                        'options'   => array('min_bytes' => 10, 'max_bytes' => 10, 'encoding' => FILTER_STRING_ENCODING_PASS)
                ),
                array(
                        'filter' => FILTER_VALIDATE_REGEXP,
                        'options' => array('regexp' => '/^[0-9]{4}-[0-9]{2}-[0-9]{2}$/')
                ),
                array(
                        'filter' => FILTER_VALIDATE_CALLBAK,
                        'options' => array('callback' => 'check_date_and_raise_exception_for_invalid()'),
                ),
                'filter' => FILTER_UNSAFE_RAW, // Evaluated last. Does nothing. It's here for an example.
        );
 
 
    $definitions = array(
        'date'    => $date_spec,
        'time'    => $time_spec, // Element spec definition is omitted in this example
        'id'      => $id_spec,
        // and so on
    );
 
    // Throws FilterValidateException for invalid inputs.
    try {
        $myinputs = filter_require_var_array($data, $definitions);
        // NOTE: If you need returned array value, it MUST be inside try block
        //       or catch block MUST terminate execution. Otherwise, returned value
        //       may contain irrelevant values.
        var_dump($myinputs); 
    } catch (FilterValidateException $e) {
        var_dump($e->getMessage());
        die('Invalid input detected!'); // Should terminate execution when input validation fails
    }
    // If validation exception is raised and catch block didn't terminate script,
    // $myinputs will have irrelevant value from previous initialization.
    // WARNING: When validation exception is raised, program MUST NOT reach here.
    // If you properly handle validation exceptions, i.e. terminate execution,
    // then you can use $myinputs safely outside of try block.
    var_dump($myinputs); 

Add string validation filter

Add missing string validation filter (FILTER_VALIDATE_STRING). This filter has conservative default. i.e. Strict validation by default.

Features:

  • Validate string as UTF-8 by default. (Only UTF-8 is supported)
  • FILTER_STRING_ENCODING_PASS 'encoding' option to disable encoding check.
  • 'min_bytes'/'max_bytes' options for string length. min_bytes default is 2, max_bytes default is 20.
  • 'allowed_chars' option can specify allowed chars. (Only works for code value less than 127)
  • Single line is allowed by default.
  • FILTER_FLAG_STRING_ALLOW_NEWLINE flag to allow multi line (\r, \n) inputs.
  • FILTER_FLAG_STRING_ALLOW_TAB to allow TAB.
  • FILTER_FLAG_STRING_ALLOW_CNTRL to allow control chars.
  • FILTER_FLAG_STRING_ALPHA to allow only alphabet
  • FILTER_FLAG_STRING_NUM to allow only number(digit)
  • FILTER_FLAG_STRING_ALNUM to allow only alphanumeric

Limitations:

  • UTF-8 only.
  • Chars control is limited code less than 127. (Only ASCII chars)

Other changes in validation filter

NOTE: These changes are only applicable when new filter_require*() functions are used

  • FILTER_VALIDATE_INT/FILTER_VALIDATE_FLOAT/FILTER_VALIDATE_BOOLEAN filters do NOT trim spaces.
  • All validation filters raise FilterValidateException for validation errors.

Discussions

Why it should be in core?

There are users who misuse current filter module for “secure coding” input validations.

Input validation is the most important security feature. PHP should provide easy to use/reliable/fast input validation feature. We should encourage strict input validation that rejects invalid(attacker) inputs by having stricter input validation features rather than filter(convert) and accept.

This proposal reduces filter module misuse which is built always by default.

Why not compare filter_var_array() result?

Following code may seem to work, but it would not.

$ret = filter_var_array($arr, $validation_spec);
if ($ret != $arr) {
  die('Input does not validate');
}
  • One should never compare float equality. (Float string is converted to float type. Think of huge string value and result of float converted value comparison.)
  • They are filter(conversion) functions. e.g. URLs are converted to lowercase.
  • It allows empty input by default and add NULL element.
  • int/float/bool validation filters trim and convert type. (They cannot match by “==” comparison)

For these reasons, comparing original and return(filtered) value is not suitable for strict input validation.

Framework should do this task

There are several reasons doing this by PHP itself.

  1. Current validation filters and filter functions are not suitable for input data validation. (Even misleading)
  2. Encourage users to do secure coding by having proper feature. i.e. Validate and accept only valid inputs.
  3. Simple apps should be able to be written by PHP's basic feature. i.e. Input data validation is mandatory for secure coding.
  4. This RFC makes easy to introduce input data validation for any PHP apps. i.e. There are many framework less codes and/or apps built with micro/light framework w/o input validation feature.
  5. It's fast. i.e. Simple array is used for validation spec definition = fast.
  6. This kind of feature is required for DbC. i.e. https://wiki.php.net/rfc/introduce_design_by_contract

Frameworks may implement their own validators with more features, but PHP should have its own usable validator because this feature is mandatory.

PHP is a first choice for Web development because PHP can write simple web apps by simple codes. PHP should try to keep this aspect as much as possible, and try to provide mandatory and/or best practice features. Otherwise, PHP would not make much difference to other languages that require Web application frameworks even for a simple web apps.

Backward Incompatible Changes

None. filter_var/input*() functions are not changed at all.

Proposed PHP Version(s)

7.1.0 or 7.2.0

RFC Impact

To SAPIs

None

To Existing Extensions

None

To Opcache

None

New Constants

String validation filter flags

  • FILTER_STRING_ENCODING_PASS - string validation filter encoding do not perform encoding check.
  • FILTER_STRING_ENCODING_UTF8 - string validation filter encoding (Default)
  • FILTER_FLAG_STRING_RAW - string validation filter flag for binary.
  • FILTER_FLAG_STRING_ALLOW_CNTRL - string validation filter flag allows all CNTRL chars
  • FILTER_FLAG_STRING_ALLOW_TAB - string validation filter flag allows TAB
  • FILTER_FLAG_STRING_ALLOW_NEWLINE - string validation filter flag allows newlines (\n,\r)
  • FILTER_FLAG_STRING_ALPHA - string validation filter flag allows alphabet only.
  • FILTER_FLAG_STRING_NUM - string validation filter flag allows digit only
  • FILTER_FLAG_STRING_ALNUM - string validation filter flag allows alphabet and digit only.

Bool validation filter flags (filter_var/input*() functions are not affected. It allow empty always)

  • FILTER_FLAG_BOOL_ALLOW_EMPTY - bool validation flag allows empty string to FALSE

php.ini Defaults

No changes

Open Issues

None

Unaffected PHP Functionality

Existing filter features are not changed at all.

Future Scope

  • Refactor code. Code is not refactored to minimize changes.
  • Add “optional filter” that allows optional input. Optional filter could be written by “callback filter” with this RFC.

Proposed Voting Choices

This project requires a 2/3 majority

Add validation functions to filter module
Real name Yes No
bwoebi (bwoebi)  
colinodell (colinodell)  
danack (danack)  
derick (derick)  
diegopires (diegopires)  
guilhermeblanco (guilhermeblanco)  
kguest (kguest)  
levim (levim)  
lstrojny (lstrojny)  
marcio (marcio)  
nikic (nikic)  
ocramius (ocramius)  
peehaa (peehaa)  
santiagolizardo (santiagolizardo)  
yohgaki (yohgaki)  
Final result: 1 14
This poll has been closed.

Please choose targeted version for this RFC

Target version
Real name 7.1.0 7.2.0
colinodell (colinodell)  
derick (derick)  
kguest (kguest)  
levim (levim)  
ocramius (ocramius)  
yohgaki (yohgaki)  
Final result: 1 5
This poll has been closed.

Vote start 2016/08/10, ends 2016/08/17 23:59:59 UTC.

Patches and Tests

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

Links to external references, discussions or RFCs

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/add_validate_functions_to_filter.1471218740.txt.gz · Last modified: 2017/09/22 13:28 (external edit)