rfc:native_regular_expressions
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
rfc:native_regular_expressions [2014/08/14 01:35] – Jotting my thoughts down. bishop | rfc:native_regular_expressions [2025/04/03 13:08] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Native Regular Expression ====== | ====== PHP RFC: Native Regular Expression ====== | ||
- | * Version: 0.1 | ||
* Date: 2014-08-13 | * Date: 2014-08-13 | ||
* Author: Bishop Bettini, bishop@php.net | * Author: Bishop Bettini, bishop@php.net | ||
* Status: Draft | * Status: Draft | ||
- | * First Published at: http:// | ||
+ | FIXME FIXME FIXME | ||
Jotting my ideas down here. Move along. Maybe called " | Jotting my ideas down here. Move along. Maybe called " | ||
+ | Consider emulating structure of https:// | ||
+ | https:// | ||
+ | |||
+ | |||
+ | ===== Introduction ===== | ||
+ | |||
+ | Regular expressions provide powerful string matching capabilities and play a critical role in most software written in PHP. For example, Github reports [[https:// | ||
+ | |||
+ | In the current engine, regular expressions are plain old strings: | ||
+ | |||
+ | <code php> | ||
+ | while (preg_match('/ | ||
+ | </ | ||
+ | |||
+ | The primary disadvantage with string representation comes when the regular expression itself needs to contain a single quote, double quote, or the delimiters bracketing the regular expression. | ||
+ | |||
+ | <code php> | ||
+ | // match foo in examples: =" | ||
+ | preg_match_all(' | ||
+ | </ | ||
+ | |||
+ | In some other languages, regular expressions are part of the language itself. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Another problem with regular expressions buried in plain old strings is that syntax highlighting becomes much more difficult. | ||
+ | |||
+ | |||
+ | <code php> | ||
+ | function getLinesFromFile($fileName) { | ||
+ | if (!$fileHandle = fopen($fileName, | ||
+ | return; | ||
+ | } | ||
+ | | ||
+ | while (false !== $line = fgets($fileHandle)) { | ||
+ | yield $line; | ||
+ | } | ||
+ | | ||
+ | fclose($fileHandle); | ||
+ | } | ||
+ | |||
+ | $lines = getLinesFromFile($fileName); | ||
+ | foreach ($lines as $line) { | ||
+ | // do something with $line | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | The code looks very similar to the array-based implementation. The main difference is that instead of pushing | ||
+ | values into an array the values are '' | ||
+ | |||
+ | Generators work by passing control back and forth between the generator and the calling code: | ||
+ | |||
+ | When you first call the generator function ('' | ||
+ | but nothing of the code is actually executed. Instead the function directly returns a '' | ||
+ | '' | ||
+ | loop: | ||
+ | |||
+ | Whenever the '' | ||
+ | hits a '' | ||
+ | |||
+ | Generator methods, together with the '' | ||
+ | classes too: | ||
+ | |||
+ | <code php> | ||
+ | class Test implements IteratorAggregate { | ||
+ | protected $data; | ||
+ | | ||
+ | public function __construct(array $data) { | ||
+ | $this-> | ||
+ | } | ||
+ | | ||
+ | public function getIterator() { | ||
+ | foreach ($this-> | ||
+ | yield $key => $value; | ||
+ | } | ||
+ | // or whatever other traversation logic the class has | ||
+ | } | ||
+ | } | ||
+ | |||
+ | $test = new Test([' | ||
+ | foreach ($test as $k => $v) { | ||
+ | echo $k, ' => ', $v, " | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Generators can also be used the other way around, i.e. instead of producing values they can also consume them. When | ||
+ | used in this way they are often referred to as enhanced generators, reverse generators or coroutines. | ||
+ | |||
+ | Coroutines are a rather advanced concept, so it very hard to come up with not too contrived an short examples. | ||
+ | For an introduction see an example [[https:// | ||
+ | If you want to know more, I highly recommend checking out [[http:// | ||
+ | on this subject]]. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
New built-in " | New built-in " | ||
+ | < | ||
syntax := re < | syntax := re < | ||
fence-post := <any character> | fence-post := <any character> | ||
Line 15: | Line 113: | ||
regex-modifiers := whatever is valid for modifiers | regex-modifiers := whatever is valid for modifiers | ||
semic := ';' | semic := ';' | ||
+ | </ | ||
Example: | Example: | ||
+ | < | ||
$regex = re /^\w+$/i | $regex = re /^\w+$/i | ||
preg_match($regex, | preg_match($regex, | ||
ereg_match($regex, | ereg_match($regex, | ||
+ | </ | ||
+ | ====== Motivation ====== | ||
+ | * Regex are integral to modern info processing | ||
+ | * Quoting them inside strings is hard: you have the quote character to deal with, plus the fence post | ||
+ | * Other languages have re built in | ||
- | Motivation | + | ====== Goals ====== |
- | * Regex are integral | + | * Reduce effort of code authors |
- | * Quoting them inside strings is hard: you have the quote character to deal with, plus the fence post | + | * Compile time verification of regex (benefit?) |
- | * Other languages have re built in | + | |
- | Goals: | + | ====== Non-goals ====== |
- | * Reduce effort of code authors to quote regex properly | + | * Adding a new regex class, with methods like $re-> |
- | * Compile time verification of regex (benefit?) | + | |
- | Non-goals: | + | ====== Similar implementations ====== |
- | * Adding a new regex class, with methods like $re-> | + | * Javascript: http:// |
+ | * Python: https:// | ||
+ | * Comparison: http:// | ||
- | Similar implementations: | + | ====== Discussions ====== |
- | * Javascript: http://mrrena.blogspot.com/2012/ | + | * https://news.ycombinator.com/item? |
- | * Python: https://docs.python.org/3/howto/regex.html | + | * http://stackoverflow.com/questions/25310999/what-is-the-maximum-length-of-a-regular-expression |
- | * Comparison: http:// | + | |
- | Discussions: | ||
- | https:// | ||
+ | ---- | ||
This is a suggested template for PHP Request for Comments (RFCs). Change this template to suit your RFC. Not all RFCs need to be tightly specified. | This is a suggested template for PHP Request for Comments (RFCs). Change this template to suit your RFC. Not all RFCs need to be tightly specified. |
rfc/native_regular_expressions.1407980114.txt.gz · Last modified: 2025/04/03 13:08 (external edit)