Table of Contents

PHP RFC: IntlBidi class

Introduction

ICU exposes a great deal of i18n/l10n functionality beyond what is currently exposed by PHP. This RFC seeks to expose the BiDi API too.

A short introduction quoted from Wikipedia

Bidirectional text consists of mainly right-to-left text with some left-to-right nested segments (such as an Arabic text with some information in English), or vice versa (such as an English letter with a Hebrew address nested within it.)

So in short and easy words: Some languages are written from left-to-right (e.g. English) and some are written from right-to-left (e.g. Hebrew). The logical order (or storage order) is left-to-right throughout. The BiDi algorithm helps us to get the visual order from a text which may have different languages in it.

Quoted from ICU:

Consider the following example, where Arabic or Hebrew letters are represented by uppercase English letters and English text is represented by lowercase letters:
english CIBARA text
The English letter h is visually followed by the Arabic letter C, but logically h is followed by the rightmost letter A. The next letter, in logical order, will be R. In other words, the logical and storage order of the same text would be:
english ARABIC text

The BiDi algorithm is implemented in all current browsers which is an argument to deprecate the “lightweight” version hebrev()/hebrevs() in PHP 7.4 (see https://wiki.php.net/rfc/deprecations_php_7_4#the_hebrev_and_hebrevc_functions). While this is true for browsers the BiDi functionalities are still usefull in the PHP userland for e.g. PDF or image creation with RTL text or a mix of RTL an LTR scripts.

Proposal

Expose the functinality from ubidi.h as IntlBidi class following the ICU API as much as possible.

Proposed PHP Version(s)

PHP 7.4

Constants

Standard constants an enumerations of UBiDiDirection, UBiDiReorderingMode, UBiDiReorderingOption. For example:

class IntlBidi {
  const DEFAULT_LTR = UBIDI_DEFAULT_LTR;
  const DEFAULT_RTL = UBIDI_DEFAULT_RTL;
  /* ... */
  const LTR = UBIDI_LTR;
  /* etc... */
}

Methods

Nearly all methods (except the methods listed below or which were used only internally) of http://icu-project.org/apiref/icu4c/ubidi_8h.html are wrapped and bundled in a single class. The signatures of all methods are equal to the original implementation (without the UBiDi argument) but the arguments were replaced by PHP equivalent types. For example:

class IntlBidi {
  public function setPara(string $paragraph, int $paraLevel = IntlBidi::DEFAULT_LTR, string $embeddingLevels): IntlBidi;
  public function setLine(int $start, int $limit): IntlBidi;
  public function setReorderingMode(int $mode): IntlBidi;
  /* etc... */
}

Not implemented

Following methods are currently not wrapped/implemented:

Renamed

Notes

Error messages/handling

Currently simply U_ILLEGAL_ARGUMENT_ERROR errors are thrown. This should be changed to more meaningful error messages.

Tests

Most tests are ported/inspired from the Java implementation: https://github.com/unicode-org/icu/blob/master/icu4j/main/tests/core/src/com/ibm/icu/dev/test/bidi/

Vote

As a non-syntax addition, this RFC requires a single 50%+1 majority.

Implementation

References