rfc:numeric_literal_separator

This is an old revision of the document!


PHP RFC: Numeric Literal Separator

Introduction

The human eye is not optimized for quickly parsing long sequences of digits. Thus, a lack of visual separators makes it take longer to read and debug code, and can lead to unintended mistakes.

1000000000;   // Is this a billion? 100 million? 10 billion?107925284.88;// What scale or power of 10 is this?

Additionally, without a visual separator numeric literals fail to convey any additional information, such as whether a financial quantity is stored in cents:

$discount = 13500; // Is this 13,500? Or 135, because it's in cents?

Proposal

Enable improved code readability by supporting an underscore in numeric literals to visually separate groups of digits.

$threshold = 1_000_000_000;  // a billion!
$testValue =107_925_284.88; // scale is hundreds of millions
$discount = 135_00;          // $135, stored as cents

Underscore separators can be used in all numeric literal notations supported by PHP:

6.674_083e-11; // float
299_792_458;   // decimal
0xCAFE_F00D;   // hexadecimal
0b0010_1101;   // binary
026_73_43;     // octal

Restrictions

The only restriction is that each underscore in a numeric literal must be directly between two digits. This rule means that none of the following usages are valid numeric literals:

_100; // already a valid constant name
 
// these all produce "Parse error: syntax error":
100_;       // trailing
1__1;       // next to underscore
1_.0; 1._0; // next to decimal point
0x_123;     // next to x
0b_101;     // next to b
1_e2; 1e_2; // next to e

Unaffected PHP Functionality

Underscores in numeric literals will be stripped out during the lexing stage, so the runtime will not be affected.

var_dump(1_000_000); // int(1000000)

This RFC does not change the behavior of string to number conversion. Numeric separators are intended to improve code readability, not alter how input is processed.

Backward Incompatible Changes

None.

Discussion

Use cases

Digit separators make possible the cognitive process of subitizing. That is, accurately and confidently “telling at a glance” the number of digits, rather than having to count them. This measurably lessens the time to correctly read numbers longer than four digits.

Large numeric literals are commonly used for business logic constants, unit test values, and performing data conversions. For example:

Composer's retry delay when removing a file:

usleep(350000); // without separator
 
usleep(350_000); // with separator

Conversion of an Active Directory timestamp (the number of 100-nanosecond intervals since January 1, 1601) to a Unix timestamp:

$time = (int) ($adTime / 10000000 - 11644473600); // without separators
 
$time = (int) ($adTime / 10_000_000 - 11_644_473_600); // with separators

Working with scientific constants:

const ASTRONOMICAL_UNIT = 149597870700; // without separator
 
const ASTRONOMICAL_UNIT = 149_597_870_700; // with separator

Use cases to avoid

It may be tempting to use integers for storing data such as phone, credit card, and social security numbers since these values appear numeric. However, this is almost always a bad idea, since these values often have prefixes and leading digits that are significant.

A good rule of thumb is that if it doesn't make sense to use mathematical operators on a value (e.g. adding it, multiplying it, dividing it, etc.), then an integer probably isn't the best way to store it.

// never do this:
$phoneNumber = 345_6789;
$creditCard = 231_6547_9081_2543;
$socialSecurity = 111_11_1111;

Will it be harder to search for numbers?

A concern that has been raised is whether numeric literal separators will make it more difficult to search for numbers, since the same value can be written in more than one way.

This is already possible, however. The same number can be written in binary, octal, decimal, hexadecimal, or exponential notation. In practice, this isn't problematic as long as a codebase is consistent.

Why resurrect this proposal?

The previous RFC was originally voted on over three years ago (January 2016). While a majority of voters supported it, it did not reach the required 2/3 threshold for acceptance.

Based on reading the discussion at the time, it didn't receive enough positive votes because there weren't many good use cases put forward for it. Also, the RFC had a short voting period of only 1 week.

Since that time, the ability to use underscores in numeric literals has been implemented in additional popular languages (e.g. Python, JavaScript, and TypeScript), and a stronger case can be made for the feature than was made before.

Should I vote for this feature?

Andrea Faulds summarized the considerations as follows:

This feature offers some benefit in some cases. It doesn't introduce
much new complexity. There's no new syntax or tokens, it just modifies
the form of the existing number tokens. It fits in well [with] what's
already there, consistently applying to all number literals. It follows
established convention in other languages. Its appearance at least hints
that values with these separators are not constants or identifiers, but
numbers, reducing potential for confusion. It limits its own application
to prevent abuse (no leading, trailing, or repeated separators). And
it's relatively intuitive.

Comparison to other languages

Numeric literal separators are widely supported in other programming languages.

  • Ada: single, between digits 1
  • C# (proposal for 7.0): multiple, between digits 2
  • C++: single, between digits (single quote used as separator) 3
  • Java: multiple, between digits 4
  • JavaScript and TypeScript: single, between digits 5
  • Julia: single, between digits 6
  • Kotlin: multiple, between digits 7
  • Perl: single, between digits 8
  • Python: single, between digits 9
  • Ruby: single, between digits 10
  • Rust: multiple, anywhere 11
  • Swift: multiple, between digits 12

Vote

Add numeric literal separators in PHP 7.4? Yes/No.

References

rfc/numeric_literal_separator.1558240823.txt.gz · Last modified: 2019/05/19 04:40 by theodorejb