rfc:uconverter
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
Previous revisionNext revision | |||
— | rfc:uconverter [2012/12/06 00:03] – [Request for Comments: ext/intl::UConverter] Implemented pollita | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Request for Comments: ext/ | ||
+ | * Version: 1.0 | ||
+ | * Date: 2012-10-29 | ||
+ | * Author: Sara Golemon < | ||
+ | * Status: Implemented for 5.5 http:// | ||
+ | * First Published at: http:// | ||
+ | Exposes ICU's UConverter functions by adding a class to the ext/intl extension | ||
+ | ===== Vote ===== | ||
+ | |||
+ | < | ||
+ | title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
+ | |||
+ | ===== Introduction ===== | ||
+ | |||
+ | The ext/intl extension only exposes some of ICU's powerful i18n functionality. | ||
+ | |||
+ | ==== Specification of the Class ==== | ||
+ | |||
+ | class UConverter { | ||
+ | /* UConverterCallbackReason */ | ||
+ | const REASON_UNASSIGNED; | ||
+ | const REASON_ILLEGAL; | ||
+ | const REASON_IRREGULAR; | ||
+ | const REASON_RESET; | ||
+ | const REASON_CLOSE; | ||
+ | const REASON_CLONE; | ||
+ | | ||
+ | /* UConverterType */ | ||
+ | const UNSUPPORTED_CONVERTER); | ||
+ | const SBCS; | ||
+ | const DBCS; | ||
+ | const MBCS; | ||
+ | const LATIN_1; | ||
+ | const UTF8; | ||
+ | const UTF16_BigEndian; | ||
+ | const UTF16_LittleEndian; | ||
+ | const UTF32_BigEndian; | ||
+ | const UTF32_LittleEndian; | ||
+ | const EBCDIC_STATEFUL; | ||
+ | const ISO_2022; | ||
+ | const LMBCS_1; | ||
+ | const LMBCS_2; | ||
+ | const LMBCS_3; | ||
+ | const LMBCS_4; | ||
+ | const LMBCS_5; | ||
+ | const LMBCS_6; | ||
+ | const LMBCS_8; | ||
+ | const LMBCS_11; | ||
+ | const LMBCS_16; | ||
+ | const LMBCS_17; | ||
+ | const LMBCS_18; | ||
+ | const LMBCS_19; | ||
+ | const LMBCS_LAST; | ||
+ | const HZ; | ||
+ | const SCSU; | ||
+ | const ISCII; | ||
+ | const US_ASCII; | ||
+ | const UTF7; | ||
+ | const BOCU1; | ||
+ | const UTF16; | ||
+ | const UTF32; | ||
+ | const CESU8; | ||
+ | const IMAP_MAILBOX; | ||
+ | | ||
+ | __construct(string $toEncoding, | ||
+ | | ||
+ | /* Setting/ | ||
+ | string getSourceEncoding(); | ||
+ | void setSourceEncoding(string $encoding); | ||
+ | string getDestinationEncoding(); | ||
+ | void setDestinationEncoding(string $encoding); | ||
+ | | ||
+ | /* Introspection for algorithmic conversions */ | ||
+ | UConverterType getSourceType(); | ||
+ | UConverterType getDestinationType(); | ||
+ | | ||
+ | /* Basic error handling */ | ||
+ | string getSubstChars(); | ||
+ | void setSubstChars(string $chars); | ||
+ | | ||
+ | /* Default callback functions */ | ||
+ | mixed toUCallback | ||
+ | mixed fromUCallback(UConverterCallbackReason $reason, Array $source, long | ||
+ | | ||
+ | /* Primary conversion workhorses */ | ||
+ | string convert(string $str[, bool $reserve = false]); | ||
+ | static string transcode(string $str, string $toEncoding, | ||
+ | | ||
+ | /* Errors */ | ||
+ | int getErrorCode(); | ||
+ | string getErrorMessage(); | ||
+ | | ||
+ | /* Ennumeration and lookup */ | ||
+ | static string reasonText(UConverterCallbackReason $reason); | ||
+ | static Array getAvailable(); | ||
+ | static Array getAliases(string $encoding); | ||
+ | static Array getStandards(); | ||
+ | } | ||
+ | |||
+ | ===== Simple uses ===== | ||
+ | |||
+ | The usage and purpose of UConverter:: | ||
+ | |||
+ | $utf8string = UConverter:: | ||
+ | |||
+ | By default, ICU will substitute a ^Z character (U+001A) in place of any code point which cannot be converted from the original encoding to Unicode, or from Unicode to the target encoding. | ||
+ | |||
+ | $asciiString = UConverter:: | ||
+ | // Yields Espa^Zol | ||
+ | |||
+ | To override the default substitution, | ||
+ | |||
+ | $opts = array(' | ||
+ | $asciiString = UConverter:: | ||
+ | // Yields Espa?ol | ||
+ | |||
+ | Note that substitution characters must represent a single codepoint in the encoding which is being converted from or to. | ||
+ | |||
+ | ===== Object Oriented Use ===== | ||
+ | |||
+ | The OOP use-case allows the caller to reuse the same converter across multiple calls: | ||
+ | |||
+ | $c = new UConverter(' | ||
+ | echo $c-> | ||
+ | echo $c-> | ||
+ | |||
+ | Similar to the functional interface above, basic error handling may be employed using substitution characters: | ||
+ | |||
+ | $c = new UConverter(' | ||
+ | $c-> | ||
+ | echo $c-> | ||
+ | echo $c-> | ||
+ | |||
+ | The converter may also run the conversion backwards with an optional second parameter to UConverter:: | ||
+ | |||
+ | $c = new UConverter(' | ||
+ | echo $c-> | ||
+ | echo $c-> | ||
+ | |||
+ | ===== Advanced Use ===== | ||
+ | |||
+ | The UConverter class actually does two conversion cycles. | ||
+ | |||
+ | class MyConverter extends UConverter { | ||
+ | public function fromUCallback($reason, | ||
+ | if (($reason == UConverter:: | ||
+ | // Basic transliteration ' | ||
+ | $error = U_ZERO_ERROR; | ||
+ | return ' | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | $c = new MyConverter(' | ||
+ | echo " | ||
+ | // Yields " | ||
+ | |||
+ | $reason will be one of the UConverterCallbackReason constants defined in the class definition above. | ||
+ | |||
+ | $source is the context from the original or intermediate string from the codeunits or codepoint where the exception occured onward. | ||
+ | |||
+ | $codeUnits is one (or more) code unit from the original string in its source encoding which was unable to be translated to Unicode. | ||
+ | |||
+ | $codepoint is the Unicode character from the intermediate string which could not be converter to the output encoding. | ||
+ | |||
+ | $error is a by-reference value which will contain the specific ICU error encountered on input, and should be modified to U_ZERO_ERROR (or some appropriate value) before returning the replacement codepoint/ | ||
+ | |||
+ | Return values for this method may be: NULL, Long, String, or Array. | ||
+ | |||
+ | ===== Error Handling ===== | ||
+ | |||
+ | Follows ext/intl convention of storing for later inspection by getErrorCode()/ | ||
+ | ===== Ennumerators ===== | ||
+ | |||
+ | A few enumeration methods are exposed as convenience. | ||
+ | |||
+ | ===== References ===== | ||
+ | |||
+ | ICU4C ucnv.h documentation: | ||
+ | |||
+ | Path: An implementation of the above can be found at https:// |
rfc/uconverter.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1