rfc:locale_independent_float_to_string

This is an old revision of the document!


PHP RFC: Locale-independent float to string cast

Introduction

The string representation of floats in PHP depends on the current locale as the decimal separator varies between locales. This can lead to subtle bugs in certain locales (https://en.wikipedia.org/wiki/Decimal_separator#Arabic_numerals) as the float to string cast won't yield the original value.

Proposal

Make PHP's float to string conversion locale-independent, meaning it will always use the dot . decimal separator, in PHP 8.0. The change would also affect functions in ext/standard that currently convert floats to strings locale-dependently:

setlocale(LC_ALL, "de_DE");
$f = 3.14;
 
(string) $f;		// 3,14 would become 3.14
strval($f);		// 3,14 would become 3.14
print_r($f);		// 3,14 would become 3.14
var_dump($f);		// float(3,14) would become float(3.14)
debug_zval_dump($f);	// float(3,14) would become float(3.14)
settype($f, "string");	// Currently modifies $f to 3,14; which would become 3.14
implode([$f])		// 3.14 would become 3.14

It should be noted that some functions and extensions already use a locale-independent string representation for floats. One such extension is PDO, which has dedicated code to ensure a consistent string representation for floats. [1] A couple of other functions where the locale does not affect the string representation:

echo var_export($f, true);	// 3.14
echo serialize($f);		// d:3.14
echo json_encode($f);		// 3.14

Moreover, the *printf family of functions won't be modified as they already have the %F modifier to specify a non-locale aware conversion:

printf("%.2f\n", $f);		// 3,14
printf("%.2F\n", $f);		// 3.14

The issue has been raised multiple times over the years on the PHP Internals mailing list [2] [3] and as bug reports [4] [5] [6], but no action has been taken yet, however there are many reasons why the problem has to be addressed:

Having a consistent string representation for floats is very important. Such values may be stored in database columns with a string type, or sent to an external API using a protocol where everything is represented as strings, such as HTTP. Therefore, the external API might refuse these values as not well formed, or even worse, they might try to interpret them as numerical values by dropping everything past the decimal separator. Apart from this, the behaviour is not always easy to notice, and highly surprising, therefore does not follow the Principle of Least Astonishment. [7]

To make things even worse, locale-sensitive casting causes some weird inconsistencies and inexplicable bugs. For example, performing the float to string and string to float casts consecutively won't result in the original value:

setlocale(LC_ALL, "de_DE");
 
$f = 3.14;		// float(3,14)
$s = (string) $f;	// string(4) "3,14"
$f = (float) $s;	// float(3)

Another problematic case is when a float is directly concatenated with a string, in which case a compile-time optimization (SCCP) performed by OPCache produces the string before any locale could be set:

setlocale(LC_ALL, "de_DE");
 
$s = 3.14 . "";		// string(4) "3.14"

We propose to modify the current behaviour without going through any deprecation period as emitting a deprecation notice would come with a large performance penalty for a core feature like casting - something we consider unacceptable.

Easing migration to PHP 8.0

We acknowledge that users may need to know where these locale-dependent conversions are taking place. Therefore a temporary INI setting debug_local_sensitive_float_casts could be introduced which control if a warning is emitted or not each time a float to string conversion would have been locale-sensitive in PHP 7, but not any more. This would allow users to find related issues in a development or testing environment where performance isn't a concern.

As this flag is meant to help the migration from PHP 7.x to PHP 8.0 we believe this should be removed from PHP as soon as possible - possibly in PHP 8.1.

Alternative Approaches

A different approach would be to deprecate setlocale() in PHP 8 as this would, as a by-product, fix this issue. However, last time this was discussed on the PHP internals list in 2016 [8] the main discussion was about it's non-thread safe behaviour as it affects global state, and the conclusion was that a thread safe variant should be introduced based on an HHVM patch.

Backward Incompatible Changes

Outputting floats as strings in locales which change the decimal separator will have a slightly different output. In our opinion, the backward compatibility break won't be very serious in practice, since the workarounds that have already been in place where locale-independent casting is needed will still work (but become unnecessary), while other use-cases (e.g. presentation) where locale-dependent casting is the expected behaviour, are likely less sensitive to the change. All in all, the benefits of having a consistent float to string conversion outweigh the impact which may be caused by this change.

To retain the old behaviour users can use the number_format() function, Intl's NumberFormatter class to format the number explicitly or use the *printf family of functions if they wish to still rely on the locale.

Future Scope

None.

Proposed Voting Choices

The primary vote (“Make float to string casts always locale-independent?”) requires 2/3 majority.

The secondary one (“Should the debug_local_sensitive_float_casts INI setting be added?”) requires a simple majority.

Changelog

0.1: Initial version
0.2: Add a debug INI setting to emit a warning when a locale-aware float to string conversion would have occurred in PHP 7

References

rfc/locale_independent_float_to_string.1587025577.txt.gz · Last modified: 2020/04/16 08:26 by kocsismate