====== PHP RFC: IntlBidi class ====== * Version: 1.0 * Date: 2018-12-18 * Author: Jan Slabon , Timo Scholz , with assistance from Sara Golemon * Status: Draft * First Published at: https://wiki.php.net/rfc/intl_ubidi ===== Introduction ===== ICU exposes a great deal of i18n/l10n functionality beyond what is currently exposed by PHP. This RFC seeks to expose the BiDi API too. A short introduction quoted from [[https://en.wikipedia.org/wiki/Bi-directional_text|Wikipedia]] > Bidirectional text consists of mainly right-to-left text with some left-to-right nested segments (such as an Arabic text with some information in English), or vice versa (such as an English letter with a Hebrew address nested within it.) So in short and easy words: Some languages are written from left-to-right (e.g. English) and some are written from right-to-left (e.g. Hebrew). The logical order (or storage order) is left-to-right throughout. The BiDi algorithm helps us to get the visual order from a text which may have different languages in it. Quoted from [[http://userguide.icu-project.org/transforms/bidi#TOC-Logical-Order-versus-Visual-Order|ICU]]: > Consider the following example, where Arabic or Hebrew letters are represented by uppercase English letters and English text is represented by lowercase letters: english CIBARA text > The English letter h is visually followed by the Arabic letter C, but logically h is followed by the rightmost letter A. The next letter, in logical order, will be R. In other words, the logical and storage order of the same text would be: english ARABIC text The BiDi algorithm is implemented in all current browsers which is an argument to deprecate the "lightweight" version ''hebrev()''/''hebrevs()'' in PHP 7.4 (see https://wiki.php.net/rfc/deprecations_php_7_4#the_hebrev_and_hebrevc_functions). While this is true for browsers the BiDi functionalities are still usefull in the PHP userland for e.g. PDF or image creation with RTL text or a mix of RTL an LTR scripts. ===== Proposal ===== Expose the functinality from [[http://icu-project.org/apiref/icu4c/ubidi_8h.html|ubidi.h]] as ''IntlBidi'' class following the ICU API as much as possible. ===== Proposed PHP Version(s) ===== PHP 7.4 ==== Constants ==== Standard constants an enumerations of UBiDiDirection, UBiDiReorderingMode, UBiDiReorderingOption. For example: class IntlBidi { const DEFAULT_LTR = UBIDI_DEFAULT_LTR; const DEFAULT_RTL = UBIDI_DEFAULT_RTL; /* ... */ const LTR = UBIDI_LTR; /* etc... */ } ==== Methods ==== Nearly all methods (except the methods listed below or which were used only internally) of http://icu-project.org/apiref/icu4c/ubidi_8h.html are wrapped and bundled in a single class. The signatures of all methods are equal to the original implementation (without the ''UBiDi'' argument) but the arguments were replaced by PHP equivalent types. For example: class IntlBidi { public function setPara(string $paragraph, int $paraLevel = IntlBidi::DEFAULT_LTR, string $embeddingLevels): IntlBidi; public function setLine(int $start, int $limit): IntlBidi; public function setReorderingMode(int $mode): IntlBidi; /* etc... */ } === Not implemented === Following methods are currently not wrapped/implemented: * ''getClassCallback()'' and ''setClassCallback()'': Would allow us overriding default Bidi class values of characters with custom ones. A very low level functionality. * ''getCustomizedClass()'': This method only makes sense if ''getClassCallback()'' and ''setClassCallback()'' are implemented. Until that [[http://php.net/manual/de/intlchar.chardirection.php|IntlChar::charDirection]] has the same functionality. * ''getText()'' would return a pointer. Useless in PHP userland. But could return the text. What's the correct behavior with a sub-instance created by setLine()? Will it return the whole text or the text of the sub-instance? * ''reorderLogical()'' and ''reorderVisual()'': Have no object context and are equal to ''getVisualMap()'' called in the object context. * ''invertMap()'': no object context. * ''writeReverse()'': no object context. But maybe usefull for converting the visual order back to logical? Or can this be done with ''getReordered()'' and the ''OUTPUT_REVERSE'' flag. NEEDS TESTS. === Renamed === * ''writeReordered()'' was renamed to ''getReordered()'' as it does not write to a buffer but returns the string. ===== Notes ===== ==== Error messages/handling ==== Currently simply U_ILLEGAL_ARGUMENT_ERROR errors are thrown. This should be changed to more meaningful error messages. ==== Tests ==== Most tests are ported/inspired from the Java implementation: https://github.com/unicode-org/icu/blob/master/icu4j/main/tests/core/src/com/ibm/icu/dev/test/bidi/ ===== Vote ===== As a non-syntax addition, this RFC requires a single 50%+1 majority. ===== Implementation ===== * Initial draft by Sara: [[https://github.com/sgolemon/php-src/compare/bidi|github/sgolemon/php-src/bidi]] * Final revision by Jan: ... ===== References ===== * BiDi Algorithm at ICU: http://userguide.icu-project.org/transforms/bidi * ubidi.h File Reference: http://icu-project.org/apiref/icu4c/ubidi_8h.html * BiDI Algorithm at Unicode: http://unicode.org/reports/tr9/