====== Request for Comments: Use default_charset As Default Character Encoding ====== * Version: 1.0.1 * Date: 2014-01-01 * Author: Yasuo Ohgaki * Status: Implemented * First Published at: http://wiki.php.net/rfc/default_encoding ===== Introduction ===== This RFC proposes that use default_charset as default character encoding. Current PHP does not have default encoding setting. This makes adoption of PHP 5.4 difficult, since PHP 5.4's htmlentities/htmlspecialchars is now default to UTF-8. Some applications are required to set proper encoding for htmlentities/htmlspecialchars for proper character processing. If users mixed ISO-8859-1 and UTF-8 (AND many other multibyte character encodings), it could cause security problem. There are many encoding setting in php.ini and functions that users simply ignore and leave it alone. However, it is required to handle character encoding properly for secure programs. Objectives of this proposal are: - Setting charset in HTTP header is recommended since the first XSS advisory in 2000 Feb. by CERT and Microsoft. (Better security) - There are too many encoding settings and it is better to consolidated. - If we have yet another multibyte string module in the future, the new common ini settings can be used. (No more module specific INIs) ===== Proposal ===== Set **default_charset="UTF-8"** as PHP default for both compiled and php.ini-* option. Add **php.input_encoding, php.internal_encoding and php.output_encoding** for encoding related module/functions. * php.input_encoding (Default: empty) * php.internal_encoding (Default: default_charset php.ini setting) * php.output_encoding (Default: empty) Use **default_charset as default** for encoding related php.ini settings and module/functions. Not touched * zend.script_encoding PHP 5.6 and master, introduce new php.ini setting. Old iconv.*/mbstring.* php.ini parameters will be removed for master PHP6. Use of iconv.*/mbstring.* php.ini parameters raise E_DEPRECATED for 5.6 and up. * php.input_encoding * php.internal_encoding * php.output_encoding * iconv.input_encoding (Default: php.input_encoding) * iconv.internal_encoding (Default: php.internal_encoding) * iconv.output_encoding (Default: php.output_encoding) * mbstring.http_input (Default: php.input_encoding) * mbstring.internal_encoding (Default: php.internal_encoding) * mbstring.http_output (Default: php.output_encoding) * all functions that take encoding option use php.internal_encoding as default (e.g. htmlentities/mb_strlen/mb_regex/etc) PHP 5.5 * leave as it is now ==== Precedence of settings ==== default_charset < php.* < mbstring.*/iconv.* < encoding specified by functions ==== Encoding name handling ==== mbstring and iconv have different level of support. * mbstring: http://www.php.net/manual/en/mbstring.supported-encodings.php * iconv: http://www.gnu.org/savannah-checkouts/gnu/libiconv/documentation/libiconv-1.13/iconv_open.3.html Notes: * iconv does not have API for getting supported encoding and iconv is built with system's iconv library. * mbstring has API to check encoding is supported or not. * users are responsible to set proper encoding name. e.g. mbstring has SJIS-win, but iconv only has SJIS * if encoding names conflicts, users should set module specific ini to valid(non empty) value ==== Use cases ==== It simplify i18n applications. Unifies *.output_encoding/*.internal_encoding/*.input_encoding setting. Users may check default_charset see if encoding conversion is needed or not. For example, pcre/sqlite only suports UTF-8 and users may check & convert encoding as follows. if (ini_get('default_charset') !== 'UTF-8') { $str = mb_convert_encoding($str, 'UTF-8'); } preg, sqlite function calls here. ==== Other related issues ==== escapeshellcmd/escapeshellarg/fgetcsv or like, are using locale based MBCS support via php_mblen(). These functions are out side of this RFC scope. Database character encoding is also out side of this RFC scope. ==== BC issues ==== None when users already using UTF-8 as their encoding. Other users may have to change "default_encoding" php.ini setting (leave it empty or set it to desired encoding) ===== Patch ===== PoC Patch https://github.com/yohgaki/php-src/compare/master-unified-encoding-setting ===== Vote Options ===== * Yes * No ===== Vote ===== Vote start: 2013/12/20 01:00 UTC Vote end: 2014/01/10 01:00 UTC * Yes * No ===== References ===== * Internals discussion - http://www.serverphorums.com/read.php?7,552099,552110 ===== Implementation ===== * http://git.php.net/?p=php-src.git;a=commit;h=cbd108abf19d9fb9ae1d4ccd153215f56a2763e8 * http://svn.php.net/viewvc?view=revision&revision=332850 * http://svn.php.net/viewvc?view=revision&revision=332851 * http://svn.php.net/viewvc?view=revision&revision=332852 * http://svn.php.net/viewvc?view=revision&revision=332857 ===== Changelog ===== * 2014-01-01 Revised unneeded php.ini removal process. * 2013-12-17 Added use case and related issues. * 2013-10-31 Added objectives. * 2013-10-29 Update target PHP version to 5.6.0. * 2013-06-29 Add PoC patch and update RFC, since PHP 5.5 has been released. * 2012-08-31 Initial version. (yohgaki)