rfc:locale_independent_float_to_string

PHP RFC: Locale-independent float to string cast

Introduction

The string representation of floats in PHP depends on the current locale as the decimal separator varies between locales. This can lead to subtle bugs in certain locales (notably the German, French, and Spanish ones) as the float to string cast won't yield the original value.

Proposal

Make PHP's float to string conversion locale-independent, meaning it will always use the dot . decimal separator, in PHP 8.0. The change would also affect functions in ext/standard that convert floats to strings locale-dependently:

setlocale(LC_ALL, "de_DE");
$f = 3.14;
 
(string) $f;		// 3,14
strval($f);		// 3,14
var_dump($f);		// float(3,14)
debug_zval_dump($f);	// float(3,14)

It should be noted that some functions and extensions already use a locale independent string representations for floats. One such extension is PDO, which has dedicated code to ensure a consistent string representation for floats. [1] A couple of functions where the locale does not affect the string representation:

echo var_export($f, true);	// 3.14
echo serialize($f);		// d:3.14
echo json_encode($f);		// 3.14

Moreover, the *printf family of functions won't be modified as they already have the %F modifier to specify a non-locale aware conversion:

printf("%.2f\n", $f);		// 3,14
printf("%.2F\n", $f);		// 3.14

The issue has been raised multiple times over the years on the PHP Internals mailing list [2] [3] and as bug reports [4] [5] [6], but no action has been taken yet, however there are many reasons why the problem has to be addressed:

Having a consistent string representation for floats is very important. Such values may be stored in database columns with a string type, or sent to an external API using a protocol where everything is represented as strings, such as HTTP. Therefore, the external API might refuse these values as not well formed, or even worse, they might try to interpret them as numerical values by dropping everything past the decimal separator. Apart from this, the behaviour is not always easy to notice, and highly surprising, therefore does not follow the Principle of Least Astonishment. [7]

To make things even worse, locale-sensitive casting causes some weird inconsistencies and inexplicable bugs. For example, the float to string and string to float casts won't have the same result:

setlocale(LC_ALL, "de_DE");
 
$f = 3.14;		// float(3,14)
$s = (string) $f;	// string(4) "3,14"
$f = (float) $s;	// float(3)

A special case of the previous example is when the float is directly concatenated with a string, in which case a compile-time optimization (SCCP) produces the string when using OPCache before any locale could be set:

setlocale(LC_ALL, "de_DE");
 
$s = 3.14 . "";		// string(4) "3.14"

Alternative Approaches

An alternative of the current proposal would be to deprecate the current behaviour in PHP 8, and remove its support in the subsequent major version, unfortunately we believe this is an impractical approach as converting floats to strings can only be fixed and not deprecated.

A different approach would be to deprecate setlocale() in PHP 8 as this would, as a by-product, fix this issue. However, last time this was discussed on the PHP internals list in 2016 [8] the main discussion was about it's non-thread safe behaviour as it affects global state, and the conclusion was that a thread safe variant should be introduced based on an HHVM patch.

Backward Incompatible Changes

Outputting floats as strings in locales which change the decimal separator will have a slightly different output. In our opinion, the backward compatibility break won't be serious in practice, since the workarounds that have already been in place where locale-independent casting is needed will become unnecessary, while other use-cases (e.g. presentation) where locale-dependent casting is the expected behaviour, are likely less sensitive to the change. All in all, the benefits of having a consistent float to string conversion outweigh the impact which may be caused by this change.

To retain the old behaviour users can use the number_format() function, Intl's NumberFormatter class to format the number explicitly or use the *printf family of functions if they wish to still rely on the locale.

Future Scope

None.

Proposed Voting Choices

The primary vote (“Make float to string casts always locale-independent?”) requires 2/3 majority.

References

rfc/locale_independent_float_to_string.txt · Last modified: 2020/03/24 11:01 by kocsismate