rfc:numeric_literal_separator

PHP RFC: Numeric Literal Separator

Introduction

The human eye is not optimized for quickly parsing long sequences of digits. Thus, a lack of visual separators makes it take longer to read and debug code, and can lead to unintended mistakes.

1000000000;   // Is this a billion? 100 million? 10 billion?107925284.88;// What scale or power of 10 is this?

Additionally, without a visual separator numeric literals fail to convey any additional information, such as whether a financial quantity is stored in cents:

$discount = 13500; // Is this 13,500? Or 135, because it's in cents?

Proposal

Enable improved code readability by supporting an underscore in numeric literals to visually separate groups of digits.

$threshold = 1_000_000_000;  // a billion!
$testValue =107_925_284.88; // scale is hundreds of millions
$discount = 135_00;          // $135, stored as cents

Underscore separators can be used in all numeric literal notations supported by PHP:

6.674_083e-11; // float
299_792_458;   // decimal
0xCAFE_F00D;   // hexadecimal
0b0101_1111;   // binary
0137_041;      // octal

Restrictions

The only restriction is that each underscore in a numeric literal must be directly between two digits. This rule means that none of the following usages are valid numeric literals:

_100; // already a valid constant name
 
// these all produce "Parse error: syntax error":
100_;       // trailing
1__1;       // next to underscore
1_.0; 1._0; // next to decimal point
0x_123;     // next to x
0b_101;     // next to b
1_e2; 1e_2; // next to e

Unaffected PHP Functionality

Adding an underscore between digits in a numeric literal will not change its value. The underscores are stripped out during the lexing stage, so the runtime is not affected.

var_dump(1_000_000); // int(1000000)

This RFC does not change the behavior of string to number conversion. Numeric separators are intended to improve code readability, not alter how input is processed.

Backward Incompatible Changes

None.

Discussion

Use cases

Digit separators make possible the cognitive process of subitizing. That is, accurately and confidently “telling at a glance” the number of digits, rather than having to count them. This measurably lessens the time to correctly read numbers longer than four digits.

Large numeric literals are commonly used for business logic constants, unit test values, and performing data conversions. For example:

Composer's retry delay when removing a file:

usleep(350000); // without separator
 
usleep(350_000); // with separator

Conversion of an Active Directory timestamp (the number of 100-nanosecond intervals since January 1, 1601) to a Unix timestamp:

$time = (int) ($adTime / 10000000 - 11644473600); // without separator
 
$time = (int) ($adTime / 10_000_000 - 11_644_473_600); // with separator

Working with scientific constants:

const ASTRONOMICAL_UNIT = 149597870700; // without separator
 
const ASTRONOMICAL_UNIT = 149_597_870_700; // with separator

Separating bytes in a binary or hex literal:

0b01010100011010000110010101101111; // without separator
 
0b01010100_01101000_01100101_01101111; // with separator
 
0x42726F776E; // without separator
 
0x42_72_6F_77_6E; // with separator

Use cases to avoid

It may be tempting to use integers for storing data such as phone, credit card, and social security numbers since these values appear numeric. However, this is almost always a bad idea, since such numbers often have prefixes and leading digits that are significant.

A good rule of thumb is that if it doesn't make sense to use mathematical operators on a value (e.g. adding it, multiplying it, etc.), then an integer probably isn't the best way to store it.

// don't do this:
$phoneNumber = 345_6789;
$creditCard = 231_6547_9081_2543;
$socialSecurity = 111_11_1111;

Will it be harder to search for numbers?

A concern that has been raised is whether numeric literal separators will make it more difficult to search for numbers, since the same value can be written in more than one way.

This is already possible, however. The same number can be written in binary, octal, decimal, hexadecimal, or exponential notation. In practice, this isn't problematic as long as a codebase is consistent.

Furthermore, separators can sometimes make it easier to find numbers. To use an earlier example, 13_500 and 135_00 could be differentiated in a find/replace. Another example would be separated bytes in a hex literal, which allows searching for a value like “_6F_” to find only the numbers containing that specific byte.

Should it be the role of an IDE to group digits?

It has been suggested that numeric literal separators aren't needed for better readability, since IDEs could be updated to automatically display large numbers in groups of three digits.

However, it isn't always desirable to group numbers the same way. For example, a programmer may write 10050000 differently depending on whether or not it represents a financial quantity stored as cents:

$total = 100_500_00; // represents $100,500.00 stored as cents
 
$total = 10_050_000; // represents $10,050,000

Binary and hex literals may also be grouped by a varying number of digits to reflect how they are used (e.g. bits may be separated into nibbles, bytes, or words). An IDE cannot do this automatically without knowing the programmer's intent for each numeric literal.

Why resurrect this proposal?

The previous RFC was originally voted on over three years ago (January 2016). While a majority of voters supported it, it did not reach the required 2/3 threshold for acceptance.

Based on reading the discussion at the time, it didn't receive enough positive votes because there weren't many good use cases put forward for it. Also, the RFC had a short voting period of only 1 week.

Since that time, the ability to use underscores in numeric literals has been implemented in additional popular languages (e.g. Python, JavaScript, and TypeScript), and a stronger case can be made for the feature than was made before.

Should I vote for this feature?

Andrea Faulds summarized the considerations as follows:

This feature offers some benefit in some cases. It doesn't introduce much new complexity. There's no new syntax or tokens, it just modifies the form of the existing number tokens. It fits in well [with] what's already there, consistently applying to all number literals. It follows established convention in other languages. Its appearance at least hints that values with these separators are not constants or identifiers, but numbers, reducing potential for confusion. It limits its own application to prevent abuse (no leading, trailing, or repeated separators). And it's relatively intuitive.

Comparison to other languages

Numeric literal separators are widely supported in other programming languages.

  • Ada: single, between digits 1
  • C#: multiple, between digits 2
  • C++: single, between digits (single quote used as separator) 3
  • Java: multiple, between digits 4
  • JavaScript and TypeScript: single, between digits 5
  • Julia: single, between digits 6
  • Kotlin: multiple, between digits 7
  • Perl: single, between digits 8
  • Python: single, between digits 9
  • Ruby: single, between digits 10
  • Rust: multiple, anywhere 11
  • Swift: multiple, between digits 12

Vote

Voting started 2019-05-30 and ended 2019-06-13.

Support numeric literal separator in PHP 7.4?
Real name Yes No
ajf (ajf)  
ashnazg (ashnazg)  
bishop (bishop)  
brzuchal (brzuchal)  
bwoebi (bwoebi)  
chregu (chregu)  
cmb (cmb)  
colinodell (colinodell)  
danack (danack)  
derick (derick)  
diegopires (diegopires)  
dmitry (dmitry)  
duncan3dc (duncan3dc)  
galvao (galvao)  
gasolwu (gasolwu)  
girgias (girgias)  
jasny (jasny)  
jbnahan (jbnahan)  
jhdxr (jhdxr)  
kalle (kalle)  
kelunik (kelunik)  
kguest (kguest)  
kinncj (kinncj)  
krakjoe (krakjoe)  
lstrojny (lstrojny)  
mike (mike)  
nikic (nikic)  
ocramius (ocramius)  
peehaa (peehaa)  
petk (petk)  
pmmaga (pmmaga)  
ramsey (ramsey)  
rasmus (rasmus)  
reywob (reywob)  
rogeriopradoj (rogeriopradoj)  
rtheunissen (rtheunissen)  
salathe (salathe)  
sammyk (sammyk)  
sergey (sergey)  
stas (stas)  
thekid (thekid)  
tpunt (tpunt)  
trowski (trowski)  
yunosh (yunosh)  
Final result: 33 11
This poll has been closed.

References

rfc/numeric_literal_separator.txt · Last modified: 2019/08/19 19:58 by theodorejb