rfc:ustring

PHP RFC: UString

Introduction

UString aims to tackle the issues of working with Unicode strings in PHP, via a core, default-enabled extension.

It does this via adding a new class called UString. This class will contain various methods to allow working with charsets such as UTF-8 (default), UTF-16 and several others.

UString provides an extension API, so that other extensions may easily work with and return UString instances.

Having UString sit on top of the ICU library will cover most cases, and it is powerful and has been battle tested for years.

Currently this sort of functionality is often handled by “mbstring”. UString is much quicker than mbstring thanks to the use of ICU, and in turn ICU's use of objects.

This speed boost comes in to play when chaining methods on the object. Instance properties such as length are available, which reduces repetition of basic logic.

Also, by sitting on top of ICU we get a huge amount of functionality for free. The code required to make the underlying ICU functionality in PHP is minimal, with UString acting as a wrapper.

Global Space

Cluttering the global space with classes and functions can be an issue for some, but this is only one class. In comparison, mbstring places 50+ functions in the global space.

The global space approach UString uses is is on par with DateTime, and other similar functionality. This RFC does not propose putting it in a php\ namespace, in the interest of consistency. If at some point moving core classes to a namespace is on the cards, UString could move with everything else.

Scalar Objects

This RFC avoids the controversial issue of scalar objects completely. Some people want them, others don't.

As is, UString can slot into PHP without any sort of controversy. On the flip side, implementing UString as a scalar object would be inconsistent. At time of writing, array, int, float, bool, etc have no implementation available for this. Furthermore, support for the idea is questionable.

Needs More Methods

Right now there are user-space libraries out there that cover a lot more functionality than UString. Systems like Phred (https://github.com/nazariyg/Phred) can do a lot more, but do so with their own userland code.

As UString aims - at this point - to be only a wrapper for ICU UnicodeString functionality (https://ssl.icu-project.org/apiref/icu4c/classicu_1_1UnicodeString.html), implementing extra functionality at this point would be unrealistic. It would increase bikeshed potential, and mean some methods are more battle-tested than others.

Frameworks and components will be able to use UString as a base for their own string classes, which many of them already do.

Not a full String API Replacement

Many people want to see the string library replaced with a whole new API. Many people don't. This does not try to replace all string usage in PHP, it only aims to replace the main uses of mbstring with a more performant alternative.

Backward Incompatible Changes

None.

Proposed PHP Version(s)

7

Open Issues

Should this be deeply integrated into intl, or standalone.

Proposed Voting Choices

By the time voting happens, we will have decided if we want to merge into intl.

The vote should be a straight yes/no vote.

Requires a 50% majority.

Patches and Tests

References

rfc/ustring.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1