Differences

This shows you the differences between two versions of the page.

--- rfc:unicode_escape [2014/11/24 22:02] – Fixed syntax ajf
+++ rfc:unicode_escape [2017/09/22 13:28] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
 ====== PHP RFC: Unicode Codepoint Escape Syntax ======
-  * Version: 0.1
+  * Version: 0.1.3
-  * Date: 2014-01-24
+  * Date: 2014-11-24, Last Updated 2014-12-08
   * Author: Andrea Faulds, ajf@ajf.me
-  * Status: Under Discussion
+  * Status: Implemented (PHP 7.0)
   * First Published at: http://wiki.php.net/rfc/unicode_escape
@@ Line 27: / Line 27: @@
 <code php>
 echo "ma\u{00F1}ana"; // pre-composed character
-echo "man\u{006E}ana"; // "n" with combining ~ character U+006E
+echo "man\u{0303}ana"; // "n" with combining ~ character (U+0303)
 </code>
@@ Line 47: / Line 47: @@
      hexadecimal-digit   codepoint-digits
-It produces the UTF-8 encoding of a Unicode codepoint, specified with hexadecimal digits.
+It produces the UTF-8 encoding of a Unicode codepoint, specified with hexadecimal digits. If the codepoint is outside the maximum range permissible (beyond U+10FFFF), an error is thrown.
 ==== Syntax Rationale ====
@@ Line 61: / Line 61: @@
 For all these reasons, the ''\u{xxxxxx}'' syntax is proposed instead. It can easily represent any valid Unicode character, e.g. ''"\u{20}"'', ''"\u{FF}"'', ''"\u{202e}"'' or ''"\u{10F602}"''. It has a clearly delimited start and end, which avoids ambiguity (compare ''"\u001000"'' and ''"\u{10}00"'') and accidental misinterpretation. Finally, it doesn't require leading zeros, they are entirely optional, so the programmer can write ''"\u{00FF}"'' or ''"\u{FF}"'' as they see fit.
-As it happens, ECMAScript 6 will also have this syntax (see References below), in order to allow specifying non-BMP codepoints. This is actually mere coincidence, I came up with this syntax before learning ES 6 would support this.
+=== Prior Art ===
+ECMAScript 6 will have an identical ''\u{xxxxxx}'' syntax to that which is proposed.
+Ruby supports this syntax also, however it allows for multiple codepoints, e.g. ''\u{20AC A3 A5}'', which is not proposed in this RFC.
+(See References below)
 ==== Encoding Rationale ====
@@ Line 72: / Line 78: @@
 This change would take place in a major version, so some level of backwards-compatibility breakage would be justified. In cases where it caused problems with existing code, fixing it could be done quite trivially by either switching to single-quoted strings, or escaping the backslash.
+In order to reduce backwards-compatibility issues, particularly with JSON in string literals, ''\u'' which is not followed by an opening ''{'' will pass through verbatim (instead of being interpreted as an escape sequence) and not raise an error. This means that existing code like ''json_decode("\"\u202e\"");'' will continue to work properly. On the other hand, ''"\u{foobar"'' will raise an error.
 ===== Proposed PHP Version(s) =====
@@ Line 83: / Line 91: @@
 ===== Future Scope =====
-None foreseeable.
+Alain Williams suggested on the mailing list that we could add a named literal syntax (i.e. something like ''\U{arabic letter alef}''), like [[http://perldoc.perl.org/perlreref.html#ESCAPE-SEQUENCES|Perl's \N]].
-===== Proposed Voting Choices =====
+===== Vote =====
 As this is a language change, a 2/3 majority would be required.
+Voting started on 2014-12-08 and ended on 2014-12-18.
+<doodle title="Accept the Unicode Codepoint Escape Syntax RFC and merge into master?" auth="ajf" voteType="single" closed="true">
+   * Yes
+   * No
+</doodle>
 ===== Patches and Tests =====
@@ Line 94: / Line 109: @@
 A language specification pull request with a patch and tests can be found here: https://github.com/php/php-langspec/pull/92
+Provisional HHVM implementation: https://reviews.facebook.net/D30153
 ===== Implementation =====
-After the project is implemented, this section should contain
-  - the version(s) it was merged to
+  * php-src merge: https://github.com/php/php-src/commit/bae46f307c2d0cdef9b8f5426adcc46920776700 (will go into PHP 7)
-  - a link to the git commit(s)
+  * HHVM merge: https://github.com/facebook/hhvm/commit/b2df7016e63ddcf328dc5bcfdf18760bba8549ec
-  - a link to the PHP manual entry for the feature
+No manual entry yet.
 ===== References =====
-  * ECMAScript 6 will have the same ''\u{xxxxxx}'' syntax: https://mathiasbynens.be/notes/javascript-unicode
+  * Ruby supports the same ''\u{xxxxxx}'' syntax: http://leejava.wordpress.com/2009/03/11/unicode-escape-in-ruby/
+  * ECMAScript 6 will also have this syntax: https://mathiasbynens.be/notes/javascript-unicode
 ===== Rejected Features =====
 Keep this updated with features that were discussed on the mail lists.
+===== Errata =====
+The name of this RFC [[https://blog.ajf.me/2015-12-07-poorly-named-rfcs|ought to have been "unicode codepoint escape sequence", not "unicode codepoint escape syntax"]].
+===== Changelog =====
+  * (2016-03-13) Added Errata
+  * v0.1.3 - ''\u'' without a following opening ''{'' passes through verbatim
+  * v0.1.2 - Ruby support
+  * v0.1.1 - Added Future Scope note on named literals
+  * v0.1 - Initial version

Differences

Page Tools