rfc:vector

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
rfc:vector [2021/09/16 14:04] – created tandrerfc:vector [2021/09/26 16:46] (current) – php-ds maintainer response tandre
Line 1: Line 1:
 ====== PHP RFC: final class Vector ====== ====== PHP RFC: final class Vector ======
-  * Version: 0.1+  * Version: 0.2
   * Date: 2021-09-16   * Date: 2021-09-16
   * Author: Tyson Andre, tandre@php.net   * Author: Tyson Andre, tandre@php.net
-  * Status: Draft+  * Status: **On hold - will be updated after https://wiki.php.net/rfc/deque and the namespacing poll is done**
   * Implementation: https://github.com/php/php-src/pull/7488   * Implementation: https://github.com/php/php-src/pull/7488
   * First Published at: http://wiki.php.net/rfc/vector   * First Published at: http://wiki.php.net/rfc/vector
Line 10: Line 10:
  
 PHP's native ''array'' type is rare among programming language in that it is used as an associative map of values, but also needs to support lists of values. PHP's native ''array'' type is rare among programming language in that it is used as an associative map of values, but also needs to support lists of values.
-In order to support both use cases, additional memory is needed to track keys (around twice as much as is needed to just store the values, for non-reference counted values) (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html). +In order to support both use cases, additional memory is needed to track keys ([[https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html|around twice as much as is needed to just store the values]], for non-reference counted values) . 
-It would be useful to have variable-length container in the standard library to (1) guarantee to application authors that the values really are a list without gaps, (2) save memory in applications or libraries that may need to store many lists of values, and (3) provide an alternative to ''SplFixedArray'' that is variable sized.+ 
 + 
 +It would be useful to have an efficient variable-length container in the standard library for the following reasons:  
 + 
 +    - To save memory in applications or libraries that may need to store many lists of values or run for long periods of time \\ (in modules identified as potentially exceeding memory limits\\ (both in userland and in native code written in php-src/PECLs) 
 +    - To provide a better alternative to ''ArrayObject'' and ''SplFixedArray'' for use cases that require variable sized collections (For lists of values) that can be passed by value to be modified. 
 +    - To give users the option of stronger runtime guarantees that property, parameter, or return values really contain a list of values without gaps, that array modifications don't introduce gaps or invalid keys, that values in the collection aren't references, etc. 
  
 ===== Proposal ===== ===== Proposal =====
-This proposes to add the class ''Vector'' to PHP.+This proposes to add the class ''final class Vector'' to PHP.
  
 Similarly to vectors in other languages, this is backed by a memory-efficient representation (raw C array of values with a size and capacity) and provides constant amortized-time push/pop operations. Similarly to vectors in other languages, this is backed by a memory-efficient representation (raw C array of values with a size and capacity) and provides constant amortized-time push/pop operations.
 +
 +Similarly to ''ArrayObject'', the ''$x[] = $value'' shorthand for appending to the Vector is supported in addition to ''ArrayAccess'' functionality.
  
 <code php> <code php>
Line 72: Line 81:
 } }
 </code> </code>
 +
 +===== Implementation Choices =====
 +
 +==== Global Namespace ====
 +
 +This maintains consistency with the namespace used for general-purpose collections already in the SPL (as well as relatively recent additions such as ''WeakReference'' and ''WeakMap''). Other recent additions to PHP such as ''ReflectionIntersectionType'' in PHP 8.1 have also continued to use the global namespace when adding classes with functionality related to other classes.
 +
 +==== Lack of Name Prefix ====
 +
 +    - Short names are more convenient to remember/use.
 +    - Possible future additions such as a Deque/Queue based on a efficient C array representation rather than a linked list would conflict with existing Spl names such as ''SplQueue'', ''SplStack'', etc.
 +    - There is already an addition to the spl without a prefix - ''ArrayObject''. Because ''array'' was already a type its name could not reasonably be any shorter.
 +
 +==== Accepting an iterable ====
 +
 +This is similar to the way the existing classes ''ArrayObject::__construct'' and ''SplFixedArray::fromArray''.
 +
 +End users may be surprised if integer keys are not the same as the ones passed in by default (e.g. if keys were unset or inserted out of order), which is why ''$preserve_keys = true'' is the default.
 +
 +Unlike ''SplFixedArray'', this is not a fixed size, which is why this accepts an iterable instead of a size. Accepting a mix of different types (''iterable|int'') is not done because it would make code harder to reason about when types are missing or inaccurate.
 +
 +''setSize'' can be used to create a vector of a certain size after instantiating an empty vector. Array library helpers such as ''array_fill'' or ''range'' may also be useful.
 +
 +==== Final Class ====
 +
 +If this were extensible, this would have the following drawbacks
 +
 +    - Not have as strong guarantees to readers of code (or even opcache, if optimizations were added targeting opcache) that elements were actually a vector or that certain methods would/wouldn't throw certain exceptions.
 +    - Require more memory and runtime checks to check if this was the original class or a subclass.
 +    - [[https://bugs.php.net/search.php?search_for=SplFixedArray&boolean=0&limit=30&order_by=&direction=DESC&cmd=display&status=All&bug_type=All&project=All&php_os=&phpver=&cve_id=&assign=&author_email=&bug_age=0&bug_updated=0&commented_by=|Be more likely to have discovered or undiscovered bugs]] due to userland extensions of ''Vector''
 +
 +==== push/pop ====
 +
 +This is consistent with the name used for ''array_push()''/''array_pop()''
 +
 +Other naming choices were chosen to be consistent with existing functionality in ''SplFixedArray''/''ArrayObject'' where reasonable.
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
-The class name ''\\Vector'' is now used by PHP, and it will be a compilation error to declare it in the global namespace since the class already exists.+The class name ''\Vector'' is now used by PHP, and it will be a compilation error to declare classlikes of the same name in the global namespace since the class already exists.
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
Line 88: Line 133:
 ===== Benchmarks ===== ===== Benchmarks =====
  
-This is a contrived benchmark for estimating the performance of building/reading variable-sized arrays of different sizes, when the final size would be unknown (it is known here).+This is a contrived benchmark for estimating the performance of building/reading variable-sized arrays of different sizes, when the final size would be unknown on 64-bit PHP (it is known here).
  
 Read time is counted separately from create+destroy time. This is a total over all iterations, and the instrumentation adds to the time needed. Read time is counted separately from create+destroy time. This is a total over all iterations, and the instrumentation adds to the time needed.
Line 264: Line 309:
  
 ===== Future Scope ===== ===== Future Scope =====
-In the futureadditional methods may be added to ''\\Vector''+If ''\Vector'' is addedthere would be plenty of time for myself or others to add additional methods before PHP 8.2's feature freeze (probably in July 2022) 
 + 
 +Additional data structures from https://github.com/TysonAndre/pecl-teds that are general purpose (such as ''\Deque'' or future additions) may be possible as well.
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
Line 273: Line 320:
  
 - https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html
 +- https://github.com/TysonAndre/pecl-teds (implementations of multiple data structures, including ''Teds\Vector'', based originally on the ''SplFixedArray'' documentation)
 +- https://externals.io/message/112639#112641
  
 ===== Rejected Features ===== ===== Rejected Features =====
 +
 +==== Why not use php-ds/ext-ds instead? ====
 +
 +    - No matter how useful or popular a PECL is, datastructures available in PHP's core will have much, much wider adoption in applications and libraries that are available in PECLs, allowing those applications and libraries to write faster and/or more memory efficient code.
 +    - End users can make much stronger assumptions about the backwards compatibility and long-term availability of data structures that are included in core.
 +    - The php-ds maintainers do not plan to merge the extension into php-src, and believe php-ds should coexist with new functionality being added in a separate namespace instead (see quote and [[##updatephp-ds_maintainer_response_clarifications|later clarifications]] for full context)
 +    - Opcache may be able to make stronger optimizations of internal classes found in php-src than any third party PECL. (e.g. because ''Deque::push()'' or ''Vector::push()'' would never throw or emit notices, it may be possible to optimize it to be even faster than appending to an array in the Opcache JIT compiler)
 +
 +=== Perceived issues and uncertainties about php-ds distribution plans ===
 +
 +This has been asked about multiple times in threads on unrelated proposals (https://externals.io/message/112639#112641 and https://externals.io/message/93301#93301 years ago) throughout the years,
 +but the maintainer of php-ds had a long term goal of developing the separately from php's release cycle (and was still focusing on the PECL when I'd asked on the GitHub issue in the link in September 2020).
 +
 +To quote the maintainer on the GitHub [[https://github.com/php-ds/ext-ds/issues/156|issue]] on php-ds/ext-ds I'd opened the last time someone suggested using php-ds (emphasis on the below quote mine)
 +
 +<blockquote>
 +**//My long-term intention has been to not merge this extension into php-src.// I would like to see it become available as a default extension at the distribution level. Unfortunately I have no influence or understanding of that process.** Having an independent release and development cycle is a good thing, in my opinion.
 +
 +If those plans change, **I would like to hold off until a 2.0 release** - I've learnt a lot over the last 4 years and would like to revisit some of the design decisions I made then, such as a significant reduction of the interfaces or perhaps more interfaces with greater specificity. Functions like ''map'', ''filter'', ''reduce'' can be delegated to other libraries that operate on ''iterable'' instead of having these as first-class members of the interface. **There is a 2.0 branch with some ideas but I haven't looked at that in a while.**
 +
 +I have been working on a research project to design persistent data structures for immutability, so there is a lot of work that I have set for myself for this project over the next 6 months or so. I have no intention to push for distribution changes in the short-term but I am open to the suggestion.
 +</blockquote>
 +
 +<blockquote>
 +> > Do you mean OS distribution level (Windows, Ubuntu, CentOS, HomeBrew for mac, etc.?)
 +
 +> He meant distribution with PHP core (on all platforms where PHP is available)
 +
 +Whichever is more viable - simply not merged into core, but distributed and enabled by default alongside it.0
 +</blockquote>
 +
 +There have been no proposals from the maintainer themselves so far to add php-ds to core or distribute it alongside core in any form.
 +That was just what the maintainer mentioned as a long term plan.
 +
 +The model of distributing an extension separately from core has never been done before, and even if approved would raise multiple concerns:
 +
 +    * I personally doubt having it developed separately from php's release cycle would be accepted by voters (e.g. if unpopular decisions couldn't be voted against or vetoed, or if RFCs passed by the community for additions of datastructures (or additions of methods to datastructures) could be overturned by the php-ds maintainers)
 +    * This may limit what features could be added by the community: For example, introducing the ''map()'' or ''filter()'' functionality to a ''Vector'' if the php-ds maintainers removed that function in a simplified 2.0.
 +    * I'm not certain how backwards compatibility would be handled in that model, e.g. if the maintainers of ext-ds wanted to drop support for a method after it was released.
 +    * This may cause delays in publishing php releases, e.g. if the maintainers were unable to quickly review patches for crashes, incompatibilities or compile errors introduced in new php versions, etc.
 +    * and other concerns (e.g. API debates such as https://externals.io/message/93301#93301)
 +
 +With php-ds itself getting merged anytime soon (if the maintainers continue to plan to distribute php-ds that way) seeming unlikely to me, I decided to start independently working on efficient data structure implementations.
 +I don't see dragging it in (against the maintainer's wishes) as a viable option for many, many, many reasons.
 +But having efficient datastructures in PHP's core is still useful.
 +
 +The timeline for php-ds 2.0 is also something I am uncertain about.
 +
 +<del>Additionally, while there may be some uses for immutable datastructures, I would believe there are more uses for mutable datastructures, especially for programmers with imperative programming backgrounds such as C/C++, and would propose these mutable datastructures regardless of those plans. Having these mutable datastructures in core is still useful to immutable programmers and functional programmers, because it provides another tool to write the internal, private implementation details in a memory-efficient way.</del>
 +
 +    * //EDIT: I misread the maintainer's response as being about the project php-ds 2.0 - I'm now pretty sure the "research project to design persistent data structures for immutability" is a different project from ext-ds and possibly in a different programming language.// \\ \\(Leaving in this comment in because immutable datastructures were brought up by others in the RFC discussion)
 +
 +While PECL development outside of php has its benefits for development and ability to make new features available in older php releases,
 +it's less likely that application and
 +library authors will start making use of those data structures because many users won't have any given PECL already installed.
 +(though php-ds also publishes a polyfill, it would not have the cpu and memory savings, and add its own overhead)
 +
 +Additionally, users (and organizations using PHP) can often make stronger assumptions on
 +backwards compatibility and long-term availability of functionality that is merged into PHP's core.
 +
 +So the choice of feature set, some names, signatures, and internal implementation details are different, because this is reimplementing a common datastructure found in different forms in many languages.
 +It's definitely a mature project, but I personally feel like reimplementing this (without referring to the php-ds source code and without copying the entire api as-is) is the best choice to add efficient data structures to core while respecting the maintainer's work on the php-ds project and their wish to maintain control over the php-ds project.
 +
 +As a result, I've been working on implementing data structures such as ''Deque'' based on php-src's data structure implementations (mostly ''SplFixedArray'' and ''ArrayObject'') instead (and based on my past PECL/RFC experience, e.g. with ''runkit7''/''igbinary'')
 +
 +=== Minor differences in API design goals ===
 +
 +Traditionally, PHP has been a very batteries included language. Existing functionality such as [[https://www.php.net/manual/en/ref.strings.php|strings]] and [[https://www.php.net/manual/en/ref.array.php|arrays]] have very large standard libraries. This makes it easier to write code without depending on too many third party composer libraries, and knowledge of the standard library can transfer to any codebase a developer works on.
 +
 +My hopes for ease of use, readability, speed, and static analysis in future data structures such as ''Vector'' are similar to those mentioned by Benjamin Morel in the GitHub issue:
 +
 +<blockquote>
 +<blockquote>Functions like map, filter, reduce can be delegated to other libraries that operate on iterable instead of having these as first-class members of the interface.</blockquote>
 +
 +Again, I understand the rationale behind this decision, like reducing duplication and keeping only the core functionality in DS. However, sometimes you have to take into consideration ease of use vs purity of the code.
 +
 +Ease of use / DX / readability: it seems more logical to me to do:
 +
 +''$map->filter(fn(...) => ...);''
 +
 +Rather than:
 +
 +''Some\filter($map, fn(...) => ...);''
 +
 +Speed: as you said, internal iteration is faster. And speed is one of the selling points of DS vs arrays.
 +
 +Static analysis: I love the fact that ''Map::filter()'' can be strictly typed as returning ''Map<TKey, TValue>'' in Psalm, for example. If you rely on a generic ''filter()'' function, I'm not sure such strict typing will be easy or even possible.
 +
 +Thank you for your work on DS anyway, I already use the extension in my closed-source project, in particular Map. I would love to use data structures in my open-source projects, one day! 🤞
 +</blockquote>
 +
 +Additionally, it may be inconvenient for end users (e.g. new contributors to projects) to remember specifics of multiple libraries or utility classes when working on different codebases, to deal with dependency conflicts after major version upgrades, or to deal with libraries dropping support for older php versions, getting abandoned, etc.
 +
 +==== Update: php-ds maintainer response clarifications ====
 +
 +On September 24, 2021, [[https://github.com/php-ds/ext-ds/issues/156#issuecomment-926353779|the maintainer responded]] after being asked about current plans for php-ds
 +
 +<blockquote>
 +Hi everyone, I am happy to see this discussion and I thank you all for taking part. My reservation to merge ds into core has always been because I wanted to make sure we get it right before we do that and the intention behind the mythical v2 was to achieve that, based on learnings from v1 and feedback from the community. I have no personal attachment to this project, I only want what is best for PHP and the community.
 +
 +I would love to see a dedicated, super-lean vec data structure in core that has native iteration and all the other same internal benefits as arrays. In my opinion, the API should be very minimal and potentially compatible with all the non-assoc array functions. An OO interface can easily be designed around that. I'm imagining something similar to Golang's slices.
 +
 +**As for the future of ds itself, I think these can co-exist and ds can remain external. I've been researching and designing immutable data structures over the last 4 years and I still hope to develop a v2 that simplifies the interfaces and introduces immutable structures. Attempting to implement a suite of structures in core or an OO vector would take a lot of work and might be difficult to reach consensus on with the API. I don't think we should attempt to merge ds into core at any time.**
 +
 +I am currently traveling and have not followed this discussion in detail on the mailing list. I'd be happy to assist in any way I can and will catch up as soon as I am home again this week. Feel free to quote this response on the mailing list as well.
 +</blockquote>
 +
 +I'm still awaiting some clarifications on how they they were willing to assist before updating the remainder of this RFC.
 +
 +Additionally, there may be differences in design goals, as noted in the above section.
  
 ==== Adding a native type instead (is_vec) ==== ==== Adding a native type instead (is_vec) ====
 +
 +https://externals.io/message/109760#109812
  
 <blockquote> <blockquote>
Line 295: Line 456:
 See https://www.npopov.com/2015/05/05/Internal-value-representation-in-PHP-7-part-1.html for how php represents values internally. See https://www.npopov.com/2015/05/05/Internal-value-representation-in-PHP-7-part-1.html for how php represents values internally.
  
-That would also require a lot more familiarity than I have with opcache and the JIT assembly compiler, and (I expect it wouldbe more controversial due to not working with existing code. +That would also require a lot more familiarity than I have with opcache and the JIT assembly compiler, and I expect it would be more controversial due to not working with existing code. 
-For a language such as Hack where feature development is done by one company(Facebook), major language redesigns and breaking changes would be much easier than for PHP, with users/developers from many different backgrounds (and a focus on backward compatibility).+For a language such as Hack where feature development is done by one company(Facebook), major language redesigns and breaking changes would much more difficult in PHP, with users/developers from many different backgrounds (and a much stronger focus on backward compatibility).
 Additionally, adding a class doesn't prevent adding a vec/list in the future - for example, HHVM has both vec and https://docs.hhvm.com/hack/reference/class/HH.Vector/ and ''HH\Vector'' remains usable, PHP has both array and ArrayObject, etc. Additionally, adding a class doesn't prevent adding a vec/list in the future - for example, HHVM has both vec and https://docs.hhvm.com/hack/reference/class/HH.Vector/ and ''HH\Vector'' remains usable, PHP has both array and ArrayObject, etc.
  
-Also, even if that were done, vec and array would be distinct types - a vec couldn't be passed to a parameter that expected an array reference (or returned in a return value), because later adding a string array key (in the parameter or return value) would be a runtime error.+Also, even if a type ''vec'' or ''array'' were added''vec'' and ''array'' would be distinct types - a vec couldn't be passed to a parameter that expected an array reference (or returned in a return value), because later adding a string array key (in the parameter or return value) would be a runtime error. 
 + 
 +==== Changelog ==== 
 + 
 +0.2: Add php-ds maintainer response, improve documentation, note this is on hold while working on ''Deque'' (Double-Ended Queue) RFC
  
rfc/vector.1631801083.txt.gz · Last modified: 2021/09/16 14:04 by tandre