rfc:numeric_literal_separator

This is an old revision of the document!


PHP RFC: Numeric Literal Separator

Introduction

The human eye is not optimized for quickly parsing long sequences of digits. Thus, a lack of visual separators increases the time it takes to read and debug code, and can lead to unintended mistakes.

1000000000;   // Is this a billion? 100 million? 10 billion?107925284.88;// What scale or power of 10 is this?

Additionally, without a visual separator numeric literals fail to convey any additional information, such as whether a financial quantity is stored in cents:

$discount = 12300; // Is this 12,300? Or 123, because it's in cents?

Proposal

Enable improved code readability by supporting an underscore in numeric literals to visually separate groups of digits.

$threshold = 1_000_000_000; // a billion!
$testAmt =107_925_284.88;  // scale is hundreds of millions
$discount = 123_00;         // $123, stored as cents

Underscore separators can be used in all numeric literal notations supported by PHP:

6.674_083e-11; // float
299_792_458;   // decimal
0xCAFE_F00D;   // hexadecimal
0b0010_1101;   // binary
026_73_43;     // octal

Restrictions

The only restriction is that each underscore in a numeric literal must be directly between two digits. This rule means that none of the following usages are valid numeric literals:

_100; // already a valid constant name
 
// these all produce "Parse error: syntax error"
100_;       // trailing
1__1;       // next to underscore
1_.0; 1._0; // next to decimal point
0x_123;     // next to x
0b_101;     // next to b
1_e2; 1e_2; // next to e

Use cases

Business logic thresholds, scientific constants, and unit test values are common situations where large numeric literals are necessary.

Use cases to avoid

It may be tempting to use integers for storing data such as phone, credit card, and social security numbers since these values appear numeric. However, this is almost always a bad idea, since these values often have prefixes and leading digits that are significant.

A good rule of thumb is that if it doesn't make sense to use mathematical operators on a value (e.g. adding it, multiplying it, dividing it, etc.), then an integer probably isn't the best way to store it.

// never do this:
$phoneNumber = 345_6789;
$creditCard = 378_2822_4631_0005;
$socialSecurity = 111_11_1111;

Backward Incompatible Changes

None.

Unaffected PHP Functionality

Underscores in numeric literals will be stripped out during the lexing stage, so the runtime will not be affected.

var_dump(1_000_000); // int(1000000)

This RFC does not change the behavior of string to number conversion. Numeric separators are intended to improve code readability, not alter how input is processed.

Will it be harder to search for numbers?

A concern that has been raised is whether numeric literal separators will make it more difficult to search for numbers, since the same value can be written in more than one way.

This is already possible, however. The same number can be written in binary, octal, decimal, hexadecimal, or exponential notation. In practice, this isn't problematic as long as a codebase is consistent.

Prior art

Numeric literal separators are widely supported in other programming languages.

  • Ada: single, between digits 1
  • C# (proposal for 7.0): multiple, between digits 2
  • C++: single, between digits (single quote used as separator) 3
  • Java: multiple, between digits 4
  • JavaScript and TypeScript: single, between digits 5
  • Julia: single, between digits 6
  • Perl: single, between digits 7
  • Python: single, between digits 8
  • Ruby: single, between digits 9
  • Rust: multiple, anywhere 10
  • Swift: multiple, between digits 11

Proposed Voting Choices

Add numeric literal separators in PHP 7.4? Yes/No.

Patches and Tests

Pending...

Why resurrect this proposal?

The previous RFC was originally voted on over three years ago (January 2016). While a majority of voters supported it, it did not reach the required 2/3 threshold for acceptance.

Based on reading the discussion at the time, it didn't receive enough positive votes because there weren't many good use cases put forward for it. Also, the RFC had a short voting period of only 1 week.

Since that time, the ability to use underscores in numeric literals has been implemented in additional popular languages (e.g. Python, JavaScript, and TypeScript), and a stronger case can be made for the feature than was made before.

References

rfc/numeric_literal_separator.1556665290.txt.gz · Last modified: 2019/04/30 23:01 by theodorejb