rfc:intl_ubidi

PHP RFC: IntlBidi class

Introduction

ICU exposes a great deal of i18n/l10n functionality beyond what is currently exposed by PHP. This RFC seeks to expose the BiDi API too.

A short introduction quoted from Wikipedia

Bidirectional text consists of mainly right-to-left text with some left-to-right nested segments (such as an Arabic text with some information in English), or vice versa (such as an English letter with a Hebrew address nested within it.)

So in short and easy words: Some languages are written from left-to-right (e.g. English) and some are written from right-to-left (e.g. Hebrew). The logical order (or storage order) is left-to-right throughout. The BiDi algorithm helps us to get the visual order from a text which may have different languages in it.

Quoted from ICU:

Consider the following example, where Arabic or Hebrew letters are represented by uppercase English letters and English text is represented by lowercase letters:
english CIBARA text
The English letter h is visually followed by the Arabic letter C, but logically h is followed by the rightmost letter A. The next letter, in logical order, will be R. In other words, the logical and storage order of the same text would be:
english ARABIC text

The BiDi algorithm is implemented in all current browsers which is an argument to deprecate the “lightweight” version hebrev()/hebrevs() in PHP 7.4 (see https://wiki.php.net/rfc/deprecations_php_7_4#the_hebrev_and_hebrevc_functions). While this is true for browsers the BiDi functionalities are still usefull in the PHP userland for e.g. PDF or image creation with RTL text or a mix of RTL an LTR scripts.

Proposal

Expose the functinality from ubidi.h as IntlBidi class following the ICU API as much as possible.

Proposed PHP Version(s)

PHP 7.4

Constants

Standard constants an enumerations of UBiDiDirection, UBiDiReorderingMode, UBiDiReorderingOption. For example:

class IntlBidi {
  const DEFAULT_LTR = UBIDI_DEFAULT_LTR;
  const DEFAULT_RTL = UBIDI_DEFAULT_RTL;
  /* ... */
  const LTR = UBIDI_LTR;
  /* etc... */
}

Methods

Nearly all methods (except the methods listed below or which were used only internally) of http://icu-project.org/apiref/icu4c/ubidi_8h.html are wrapped and bundled in a single class. The signatures of all methods are equal to the original implementation (without the UBiDi argument) but the arguments were replaced by PHP equivalent types. For example:

class IntlBidi {
  public function setPara(string $paragraph, int $paraLevel = IntlBidi::DEFAULT_LTR, string $embeddingLevels): IntlBidi;
  public function setLine(int $start, int $limit): IntlBidi;
  public function setReorderingMode(int $mode): IntlBidi;
  /* etc... */
}

Not implemented

Following methods are currently not wrapped/implemented:

  • getClassCallback() and setClassCallback(): Would allow us overriding default Bidi class values of characters with custom ones. A very low level functionality.
  • getCustomizedClass(): This method only makes sense if getClassCallback() and setClassCallback() are implemented. Until that IntlChar::charDirection has the same functionality.
  • getText() would return a pointer. Useless in PHP userland. But could return the text. What's the correct behavior with a sub-instance created by setLine()? Will it return the whole text or the text of the sub-instance?
  • reorderLogical() and reorderVisual(): Have no object context and are equal to getVisualMap() called in the object context.
  • invertMap(): no object context.
  • writeReverse(): no object context. But maybe usefull for converting the visual order back to logical? Or can this be done with getReordered() and the OUTPUT_REVERSE flag. NEEDS TESTS.

Renamed

  • writeReordered() was renamed to getReordered() as it does not write to a buffer but returns the string.

Notes

Error messages/handling

Currently simply U_ILLEGAL_ARGUMENT_ERROR errors are thrown. This should be changed to more meaningful error messages.

Tests

Vote

As a non-syntax addition, this RFC requires a single 50%+1 majority.

Implementation

References

rfc/intl_ubidi.txt · Last modified: 2018/12/18 20:15 by pollita