This is an old revision of the document!
Request for Comments: Use default_charset As Default Character Encoding
- Version: 0.2
- Date: 2013-06-29
- Author: Yasuo Ohgaki yohgaki@ohgaki.net
- Status: Under Discussion
- First Published at: http://wiki.php.net/rfc/default_encoding
Introduction
This RFC proposes that use default_charset as default character encoding.
Current PHP does not have default encoding setting. This makes adoption of PHP 5.4 difficult, since PHP 5.4's htmlentities/htmlspecialchars is now default to UTF-8. Some applications are required to set proper encoding for htmlentities/htmlspecialchars for proper character processing. If users mixed ISO-8859-1 and UTF-8 (AND many other multibyte character encodings), it could cause security problem.
There are many encoding setting in php.ini and functions that users simply ignore and leave it alone. However, it is required to handle character encoding properly for secure programs.
Proposal
Set default_charset=“UTF-8” as PHP default for both compiled and php.ini-* option.
Add php.input_encoding, php.internal_encoding and php.output_encoding for encoding related module/functions.
- php.input_encoding (Default: empty)
- php.internal_encoding (Default: default_charset php.ini setting)
- php.output_encoding (Default: empty)
Use default_charset as default for encoding related php.ini settings and module/functions.
Not tuoched
- zend.script_encoding
PHP 5.5 and master, introduce new php.ini settting. Old iconv.*/mbstring.* php.ini parameters will be removed for master.
- php.input_encoding
- php.internal_encoding
- php.output_encoding
PHP 5.5
- iconv.input_encoding (Default: php.input_encoding)
- iconv.internal_encoding (Default: php.internal_encoding)
- iconv.output_encoding (Default: php.output_encoding)
- mbstring.http_input (Default: php.output_encoding)
- mbstring.internal_encoding (Default: php.internal_encoding)
- mbstring.http_output (Default: php.output_encoding)
- all functions that take encoding option use php.internal_encording as default (e.g. htmlentities/mb_strlen/mb_regex/etc)
PHP 5.4
- leave as it is now
Precedence of settings
default_charset < php.* < mbstring.*/iconv.* < encoding specified by functions
Encoding name handling
mbstring and iconv have different level of support.
Notes:
- iconv does not have API for getting supported encoding and iconv is built with system's iconv library.
- mbstring has API to check encoding is supported or not.
- users are responsible to set proper encoding name. e.g. mbstring has SJIS-win, but iconv only has SJIS
Patch
Vote
Not yet
References
- Internals discussion - http://www.serverphorums.com/read.php?7,552099,552110
Changelog
- 2013-06-29 Add PoC patch and update RFC, since PHP 5.5 has been released.
- 2012-08-31 Initial version. (yohgaki)