rfc:intl.char

This is an old revision of the document!


PHP RFC: IntlChar class

Introduction

ICU exposes a great deal of i18n/l10n functionality beyond what is currently exposed by PHP. This RFC seeks to expose just a little bit more...

Proposal

Expose additional ICU functionality from uchar.h as IntlChar::*() following the ICU API as much as possible.

Proposed PHP Version(s)

PHP 7 (or 5.next if there is one)

New Constants

Enumerations of UProperty, UCharNameChoice, UPropertyNameChoice, UCharDirection, UBlockCode, etc... For example:

class IntlChar {
  const PROPERTY_ALPHABETIC = _UCHAR_ALPHABETIC_;
  const PROPERTY_ASCII_HEX_DIGIT = _UCHAR_ASCII_HEX_DIGIT_;
  /* etc... */
}

New Static Methods

Mapping of ICU API to PHP. For example:

class IntlChar {
  static public function hasBinaryProperty(int $codepoint, int $property): bool;
  static public function isAlphabetic(int $codepoint): bool;
  /* etc... */
}

Note that properties taking a codepoint will accept either an integer codepoint value (e.g. 0x2603 for U+2603 SNOWMAN), or the character encoded as UTF-8 (e.g. “\xE2\x98\x83”). For methods which return a codepoint, they will return int unless they accepted a codepoint as a utf-8 string, in which case they remain utf-8.

Notes

I also added IntlChar::chr() and IntlChar::ord() which aren't directly part of the API, but they made sense as wrappers for the U8_*() family of macros.

Some methods take a range in the form ($start, $limit) which the range is INclusive of $start, and EXclusive of $limit. i.e. (0x20, 0x30) => 0x20..0x2F. I kept this meaning for $limit to stay consistent with the ICU API, but changing $limit to have the semantics of $end would probably make more sense in PHP.

Proposed Voting Choices

50% + 1: “Merge IntlChar implementation as is?”

Implementation

rfc/intl.char.1418714227.txt.gz · Last modified: 2017/09/22 13:28 (external edit)