PHP RFC: IntlBidi class
- Version: 1.0
- Date: 2018-12-18
- Author: Jan Slabon jan.slabon@setasign.com, Timo Scholz timo.scholz@setasign.com, with assistance from Sara Golemon pollita@php.net
- Status: Draft
- First Published at: https://wiki.php.net/rfc/intl_ubidi
Introduction
ICU exposes a great deal of i18n/l10n functionality beyond what is currently exposed by PHP. This RFC seeks to expose the BiDi API too.
A short introduction quoted from Wikipedia
Bidirectional text consists of mainly right-to-left text with some left-to-right nested segments (such as an Arabic text with some information in English), or vice versa (such as an English letter with a Hebrew address nested within it.)So in short and easy words: Some languages are written from left-to-right (e.g. English) and some are written from right-to-left (e.g. Hebrew). The logical order (or storage order) is left-to-right throughout. The BiDi algorithm helps us to get the visual order from a text which may have different languages in it.
Quoted from ICU:
Consider the following example, where Arabic or Hebrew letters are represented by uppercase English letters and English text is represented by lowercase letters:
english CIBARA text
The English letter h is visually followed by the Arabic letter C, but logically h is followed by the rightmost letter A. The next letter, in logical order, will be R. In other words, the logical and storage order of the same text would be:
english ARABIC text
The BiDi algorithm is implemented in all current browsers which is an argument to deprecate the “lightweight” version hebrev()
/hebrevs()
in PHP 7.4 (see https://wiki.php.net/rfc/deprecations_php_7_4#the_hebrev_and_hebrevc_functions).
While this is true for browsers the BiDi functionalities are still usefull in the PHP userland for e.g. PDF or image creation with RTL text or a mix of RTL an LTR scripts.
Proposal
Expose the functinality from ubidi.h as IntlBidi
class following the ICU API as much as possible.
Proposed PHP Version(s)
PHP 7.4
Constants
Standard constants an enumerations of UBiDiDirection, UBiDiReorderingMode, UBiDiReorderingOption. For example:
class IntlBidi { const DEFAULT_LTR = UBIDI_DEFAULT_LTR; const DEFAULT_RTL = UBIDI_DEFAULT_RTL; /* ... */ const LTR = UBIDI_LTR; /* etc... */ }
Methods
Nearly all methods (except the methods listed below or which were used only internally) of http://icu-project.org/apiref/icu4c/ubidi_8h.html are wrapped and bundled in a single class. The signatures of all methods are equal to the original implementation (without the UBiDi
argument) but the arguments were replaced by PHP equivalent types. For example:
class IntlBidi { public function setPara(string $paragraph, int $paraLevel = IntlBidi::DEFAULT_LTR, string $embeddingLevels): IntlBidi; public function setLine(int $start, int $limit): IntlBidi; public function setReorderingMode(int $mode): IntlBidi; /* etc... */ }
Not implemented
Following methods are currently not wrapped/implemented:
getClassCallback()
andsetClassCallback()
: Would allow us overriding default Bidi class values of characters with custom ones. A very low level functionality.getCustomizedClass()
: This method only makes sense ifgetClassCallback()
andsetClassCallback()
are implemented. Until that IntlChar::charDirection has the same functionality.getText()
would return a pointer. Useless in PHP userland. But could return the text. What's the correct behavior with a sub-instance created by setLine()? Will it return the whole text or the text of the sub-instance?reorderLogical()
andreorderVisual()
: Have no object context and are equal togetVisualMap()
called in the object context.invertMap()
: no object context.writeReverse()
: no object context. But maybe usefull for converting the visual order back to logical? Or can this be done withgetReordered()
and theOUTPUT_REVERSE
flag. NEEDS TESTS.
Renamed
writeReordered()
was renamed togetReordered()
as it does not write to a buffer but returns the string.
Notes
Error messages/handling
Currently simply U_ILLEGAL_ARGUMENT_ERROR errors are thrown. This should be changed to more meaningful error messages.
Tests
Most tests are ported/inspired from the Java implementation: https://github.com/unicode-org/icu/blob/master/icu4j/main/tests/core/src/com/ibm/icu/dev/test/bidi/
Vote
As a non-syntax addition, this RFC requires a single 50%+1 majority.
Implementation
- Initial draft by Sara: github/sgolemon/php-src/bidi
- Final revision by Jan: ...
References
- BiDi Algorithm at ICU: http://userguide.icu-project.org/transforms/bidi
- ubidi.h File Reference: http://icu-project.org/apiref/icu4c/ubidi_8h.html
- BiDI Algorithm at Unicode: http://unicode.org/reports/tr9/