rfc:uconverter

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
rfc:uconverter [2012/10/30 17:26]
pollita [Patch] Add link to ICU4C ucnv.h documentation
rfc:uconverter [2017/09/22 13:28] (current)
Line 3: Line 3:
   * Date: 2012-10-29   * Date: 2012-10-29
   * Author: Sara Golemon <pollita@php.net>   * Author: Sara Golemon <pollita@php.net>
-  * Status: Under Discussion+  * Status: Implemented for 5.5 http://git.php.net/?p=php-src.git;a=commit;h=5ac35770f45e295cab1ed3c166131d11c27655f6
   * First Published at: http://wiki.php.net/rfc/uconverter   * First Published at: http://wiki.php.net/rfc/uconverter
  
 Exposes ICU's UConverter functions by adding a class to the ext/intl extension Exposes ICU's UConverter functions by adding a class to the ext/intl extension
 +===== Vote =====
 +
 +<doodle 
 +title="Should the current UConverter implementation be merged" auth="cataphract" voteType="single" closed="True">
 +   * Yes
 +   * No
 +</doodle>
  
 ===== Introduction ===== ===== Introduction =====
Line 16: Line 23:
   class UConverter {   class UConverter {
     /* UConverterCallbackReason */     /* UConverterCallbackReason */
-    const UCNV_UNASSIGNED+    const REASON_UNASSIGNED
-    const UCNV_ILLEGAL+    const REASON_ILLEGAL
-    const UCNV_IRREGULAR+    const REASON_IRREGULAR
-    const UCNV_RESET+    const REASON_RESET
-    const UCNV_CLOSE+    const REASON_CLOSE
-    const UCNV_CLONE;+    const REASON_CLONE;
          
     /* UConverterType */     /* UConverterType */
-    const UCNV_UNSUPPORTED_CONVERTER); +    const UNSUPPORTED_CONVERTER); 
-    const UCNV_SBCS+    const SBCS
-    const UCNV_DBCS+    const DBCS
-    const UCNV_MBCS+    const MBCS
-    const UCNV_LATIN_1+    const LATIN_1
-    const UCNV_UTF8+    const UTF8
-    const UCNV_UTF16_BigEndian+    const UTF16_BigEndian
-    const UCNV_UTF16_LittleEndian+    const UTF16_LittleEndian
-    const UCNV_UTF32_BigEndian+    const UTF32_BigEndian
-    const UCNV_UTF32_LittleEndian+    const UTF32_LittleEndian
-    const UCNV_EBCDIC_STATEFUL+    const EBCDIC_STATEFUL
-    const UCNV_ISO_2022+    const ISO_2022
-    const UCNV_LMBCS_1+    const LMBCS_1
-    const UCNV_LMBCS_2+    const LMBCS_2
-    const UCNV_LMBCS_3+    const LMBCS_3
-    const UCNV_LMBCS_4+    const LMBCS_4
-    const UCNV_LMBCS_5+    const LMBCS_5
-    const UCNV_LMBCS_6+    const LMBCS_6
-    const UCNV_LMBCS_8+    const LMBCS_8
-    const UCNV_LMBCS_11+    const LMBCS_11
-    const UCNV_LMBCS_16+    const LMBCS_16
-    const UCNV_LMBCS_17+    const LMBCS_17
-    const UCNV_LMBCS_18+    const LMBCS_18
-    const UCNV_LMBCS_19+    const LMBCS_19
-    const UCNV_LMBCS_LAST+    const LMBCS_LAST
-    const UCNV_HZ+    const HZ
-    const UCNV_SCSU+    const SCSU
-    const UCNV_ISCII+    const ISCII
-    const UCNV_US_ASCII+    const US_ASCII
-    const UCNV_UTF7+    const UTF7
-    const UCNV_BOCU1+    const BOCU1
-    const UCNV_UTF16+    const UTF16
-    const UCNV_UTF32+    const UTF32
-    const UCNV_CESU8+    const CESU8
-    const UCNV_IMAP_MAILBOX;+    const IMAP_MAILBOX;
          
     __construct(string $toEncoding, string $fromEncoding);     __construct(string $toEncoding, string $fromEncoding);
Line 77: Line 84:
          
     /* Default callback functions */     /* Default callback functions */
-    string toUCallback  (UConverterCallbackReason $reason, string $source, string $codeUnits, UErrorCode &$error); +    mixed toUCallback  (UConverterCallbackReason $reason, string $source, string $codeUnits, UErrorCode &$error); 
-    string fromUCallback(UConverterCallbackReason $reason, Array  $source, long   $codePoint, UErrorCode &$error);+    mixed fromUCallback(UConverterCallbackReason $reason, Array  $source, long   $codePoint, UErrorCode &$error);
          
     /* Primary conversion workhorses */     /* Primary conversion workhorses */
     string convert(string $str[, bool $reserve = false]);     string convert(string $str[, bool $reserve = false]);
     static string transcode(string $str, string $toEncoding, string $fromEncoding[, Array $options]);     static string transcode(string $str, string $toEncoding, string $fromEncoding[, Array $options]);
 +    
 +    /* Errors */
 +    int getErrorCode();
 +    string getErrorMessage();
          
     /* Ennumeration and lookup */     /* Ennumeration and lookup */
Line 133: Line 144:
 ===== Advanced Use ===== ===== Advanced Use =====
  
-The UConverter class may be extended and its default methods toUCallback() and fromUCallback() overridden to provide advanced handling of error cases:+The UConverter class actually does two conversion cycles.  One from the source encoding to its internal UChar (Unicode) representation, then again from that to the destination encoding.  During each cycle, errors are handled by the built-in toUCallback() and fromUCallback() methods which may be overridden in a child class:
  
   class MyConverter extends UConverter {   class MyConverter extends UConverter {
     public function fromUCallback($reason, $source, $codepoint, &$error) {     public function fromUCallback($reason, $source, $codepoint, &$error) {
-      if (($reason == UConverter::UCNV_UNASSIGNED) && ($codepoint == 0x00F1)) {+      if (($reason == UConverter::REASON_UNASSIGNED) && ($codepoint == 0x00F1)) {
         // Basic transliteration 'ñ' to 'n'         // Basic transliteration 'ñ' to 'n'
         $error = U_ZERO_ERROR;         $error = U_ZERO_ERROR;
Line 148: Line 159:
   // Yields "Espanol"   // Yields "Espanol"
  
-===== Error Handling =====+$reason will be one of the UConverterCallbackReason constants defined in the class definition above.  UCNV_RESET, UCNV_CLOSE, and UCNV_CLONE are informational events and do not require any direct action.  The remaining events describe some form of exception case which must be handled. See Return Values below.
  
-Any errors encountered while calling UConverter::transcode() are raised as standard E_WARNING notices and NULL is returned (to conform with non-OOP error handling styles).  Errors encountered in OOP usage are raised as a thrown instance of UConverterException.+$source is the context from the original or intermediate string from the codeunits or codepoint where the exception occured onward.  For toUCallback(), this will be a string of codeunits, for fromUCallback(), this will be an array of codepoints (integers). 
 + 
 +$codeUnits is one (or more) code unit from the original string in its source encoding which was unable to be translated to Unicode. 
 + 
 +$codepoint is the Unicode character from the intermediate string which could not be converter to the output encoding. 
 + 
 +$error is a by-reference value which will contain the specific ICU error encountered on input, and should be modified to U_ZERO_ERROR (or some appropriate valuebefore returning the replacement codepoint/codeunits. 
 + 
 +Return values for this method may be: NULL, Long, String, or Array.  A value of NULL indicates that the codepoint/codeunit should be ignored and left out of the destination/intermediate string.  A Long return value will be treated as either a Unicode codepoint for toUCallback(), or a single-byte character in the target encoding for fromUCallback().  A String return value will be treated as one (or more) UTF8 encoded codepoints for toUCallback(), or multi-byte character (or characters) in the target encoding for fromUCallback(). 
 + 
 +===== Error Handling =====
  
 +Follows ext/intl convention of storing for later inspection by getErrorCode()/getErrorMessage(), optionally thrown as exceptions (based on INI configuration).
 ===== Ennumerators ===== ===== Ennumerators =====
  
Line 160: Line 182:
 ICU4C ucnv.h documentation: http://icu-project.org/apiref/icu4c/ucnv_8h.html ICU4C ucnv.h documentation: http://icu-project.org/apiref/icu4c/ucnv_8h.html
  
-Path: An implementation of the above can be found at https://github.com/sgolemon/php-src/commit/6a93158dab454476edcb6444f782822c0b1cc18d+Path: An implementation of the above can be found at https://github.com/sgolemon/php-src/compare/master...uconverter
rfc/uconverter.1351617977.txt.gz · Last modified: 2017/09/22 13:28 (external edit)