rfc:vector

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:vector [2021/09/17 02:10] tandrerfc:vector [2021/09/26 16:46] (current) – php-ds maintainer response tandre
Line 1: Line 1:
 ====== PHP RFC: final class Vector ====== ====== PHP RFC: final class Vector ======
-  * Version: 0.1+  * Version: 0.2
   * Date: 2021-09-16   * Date: 2021-09-16
   * Author: Tyson Andre, tandre@php.net   * Author: Tyson Andre, tandre@php.net
-  * Status: Under Discussion+  * Status: **On hold - will be updated after https://wiki.php.net/rfc/deque and the namespacing poll is done**
   * Implementation: https://github.com/php/php-src/pull/7488   * Implementation: https://github.com/php/php-src/pull/7488
   * First Published at: http://wiki.php.net/rfc/vector   * First Published at: http://wiki.php.net/rfc/vector
Line 325: Line 325:
 ===== Rejected Features ===== ===== Rejected Features =====
  
-==== Why not use php-ds instead? ====+==== Why not use php-ds/ext-ds instead? ====
  
-https://externals.io/message/112639#112641+    - No matter how useful or popular a PECL is, datastructures available in PHP's core will have much, much wider adoption in applications and libraries that are available in PECLs, allowing those applications and libraries to write faster and/or more memory efficient code. 
 +    - End users can make much stronger assumptions about the backwards compatibility and long-term availability of data structures that are included in core. 
 +    - The php-ds maintainers do not plan to merge the extension into php-src, and believe php-ds should coexist with new functionality being added in a separate namespace instead (see quote and [[##updatephp-ds_maintainer_response_clarifications|later clarifications]] for full context) 
 +    - Opcache may be able to make stronger optimizations of internal classes found in php-src than any third party PECL. (e.g. because ''Deque::push()'' or ''Vector::push()'' would never throw or emit notices, it may be possible to optimize it to be even faster than appending to an array in the Opcache JIT compiler)
  
-This has been asked about multiple times in threads on unrelated proposals, but the maintainer of php-ds had intended to develop the extension separately from php's release cycle.+=== Perceived issues and uncertainties about php-ds distribution plans ===
  
-While PECL development has its benefits for development and ability to make new features available in older php releases, it's less likely that application and library authors will start making use of those data structures because many users won't have a PECL already installed(though php-ds also publishes polyfill, it would not have the cpu and memory savings, and add its own overhead)+This has been asked about multiple times in threads on unrelated proposals (https://externals.io/message/112639#112641 and https://externals.io/message/93301#93301 years ago) throughout the years, 
 +but the maintainer of php-ds had long term goal of developing the separately from php's release cycle (and was still focusing on the PECL when I'd asked on the GitHub issue in the link in September 2020).
  
-    * Additionally, users can often make stronger assumptions on backwards compatibility and long-term availability of functionality that is merged into PHP's core.+To quote the maintainer on the GitHub [[https://github.com/php-ds/ext-ds/issues/156|issue]] on php-ds/ext-ds I'd opened the last time someone suggested using php-ds (emphasis on the below quote mine)
  
-As a result, I've been working on implementing data structures such as Vector based on php-src's data structure implementations instead (and my past PECL/RFC experience, e.g. with ''runkit7''/''igbinary'')+<blockquote> 
 +**//My long-term intention has been to not merge this extension into php-src.// I would like to see it become available as a default extension at the distribution level. Unfortunately I have no influence or understanding of that process.** Having an independent release and development cycle is a good thing, in my opinion. 
 + 
 +If those plans change, **I would like to hold off until a 2.0 release** - I've learnt a lot over the last 4 years and would like to revisit some of the design decisions I made then, such as a significant reduction of the interfaces or perhaps more interfaces with greater specificity. Functions like ''map'', ''filter'', ''reduce'' can be delegated to other libraries that operate on ''iterable'' instead of having these as first-class members of the interface. **There is a 2.0 branch with some ideas but I haven't looked at that in a while.** 
 + 
 +I have been working on a research project to design persistent data structures for immutability, so there is a lot of work that I have set for myself for this project over the next 6 months or so. I have no intention to push for distribution changes in the short-term but I am open to the suggestion. 
 +</blockquote> 
 + 
 +<blockquote> 
 +> > Do you mean OS distribution level (Windows, Ubuntu, CentOS, HomeBrew for mac, etc.?) 
 + 
 +> He meant distribution with PHP core (on all platforms where PHP is available) 
 + 
 +Whichever is more viable - simply not merged into core, but distributed and enabled by default alongside it.0 
 +</blockquote> 
 + 
 +There have been no proposals from the maintainer themselves so far to add php-ds to core or distribute it alongside core in any form. 
 +That was just what the maintainer mentioned as a long term plan. 
 + 
 +The model of distributing an extension separately from core has never been done before, and even if approved would raise multiple concerns: 
 + 
 +    * I personally doubt having it developed separately from php's release cycle would be accepted by voters (e.g. if unpopular decisions couldn't be voted against or vetoed, or if RFCs passed by the community for additions of datastructures (or additions of methods to datastructures) could be overturned by the php-ds maintainers) 
 +    * This may limit what features could be added by the community: For example, introducing the ''map()'' or ''filter()'' functionality to a ''Vector'' if the php-ds maintainers removed that function in a simplified 2.0. 
 +    * I'm not certain how backwards compatibility would be handled in that model, e.g. if the maintainers of ext-ds wanted to drop support for a method after it was released. 
 +    * This may cause delays in publishing php releases, e.g. if the maintainers were unable to quickly review patches for crashes, incompatibilities or compile errors introduced in new php versions, etc. 
 +    * and other concerns (e.g. API debates such as https://externals.io/message/93301#93301) 
 + 
 +With php-ds itself getting merged anytime soon (if the maintainers continue to plan to distribute php-ds that way) seeming unlikely to me, I decided to start independently working on efficient data structure implementations. 
 +I don't see dragging it in (against the maintainer's wishes) as a viable option for many, many, many reasons. 
 +But having efficient datastructures in PHP's core is still useful. 
 + 
 +The timeline for php-ds 2.0 is also something I am uncertain about. 
 + 
 +<del>Additionally, while there may be some uses for immutable datastructures, I would believe there are more uses for mutable datastructures, especially for programmers with imperative programming backgrounds such as C/C++, and would propose these mutable datastructures regardless of those plans. Having these mutable datastructures in core is still useful to immutable programmers and functional programmers, because it provides another tool to write the internal, private implementation details in a memory-efficient way.</del> 
 + 
 +    * //EDIT: I misread the maintainer's response as being about the project php-ds 2.0 - I'm now pretty sure the "research project to design persistent data structures for immutability" is a different project from ext-ds and possibly in a different programming language.// \\ \\(Leaving in this comment in because immutable datastructures were brought up by others in the RFC discussion) 
 + 
 +While PECL development outside of php has its benefits for development and ability to make new features available in older php releases, 
 +it's less likely that application and 
 +library authors will start making use of those data structures because many users won't have any given PECL already installed. 
 +(though php-ds also publishes a polyfill, it would not have the cpu and memory savings, and add its own overhead) 
 + 
 +Additionally, users (and organizations using PHP) can often make stronger assumptions on 
 +backwards compatibility and long-term availability of functionality that is merged into PHP's core. 
 + 
 +So the choice of feature set, some names, signatures, and internal implementation details are different, because this is reimplementing a common datastructure found in different forms in many languages. 
 +It's definitely a mature project, but I personally feel like reimplementing this (without referring to the php-ds source code and without copying the entire api as-is) is the best choice to add efficient data structures to core while respecting the maintainer's work on the php-ds project and their wish to maintain control over the php-ds project. 
 + 
 +As a result, I've been working on implementing data structures such as ''Deque'' based on php-src's data structure implementations (mostly ''SplFixedArray'' and ''ArrayObject''instead (and based on my past PECL/RFC experience, e.g. with ''runkit7''/''igbinary'') 
 + 
 +=== Minor differences in API design goals === 
 + 
 +Traditionally, PHP has been a very batteries included language. Existing functionality such as [[https://www.php.net/manual/en/ref.strings.php|strings]] and [[https://www.php.net/manual/en/ref.array.php|arrays]] have very large standard libraries. This makes it easier to write code without depending on too many third party composer libraries, and knowledge of the standard library can transfer to any codebase a developer works on. 
 + 
 +My hopes for ease of use, readability, speed, and static analysis in future data structures such as ''Vector'' are similar to those mentioned by Benjamin Morel in the GitHub issue: 
 + 
 +<blockquote> 
 +<blockquote>Functions like map, filter, reduce can be delegated to other libraries that operate on iterable instead of having these as first-class members of the interface.</blockquote> 
 + 
 +Again, I understand the rationale behind this decision, like reducing duplication and keeping only the core functionality in DS. However, sometimes you have to take into consideration ease of use vs purity of the code. 
 + 
 +Ease of use / DX / readability: it seems more logical to me to do: 
 + 
 +''$map->filter(fn(...) => ...);'' 
 + 
 +Rather than: 
 + 
 +''Some\filter($map, fn(...) => ...);'' 
 + 
 +Speed: as you said, internal iteration is faster. And speed is one of the selling points of DS vs arrays. 
 + 
 +Static analysis: I love the fact that ''Map::filter()'' can be strictly typed as returning ''Map<TKey, TValue>'' in Psalm, for example. If you rely on a generic ''filter()'' function, I'm not sure such strict typing will be easy or even possible. 
 + 
 +Thank you for your work on DS anyway, I already use the extension in my closed-source project, in particular Map. I would love to use data structures in my open-source projects, one day! 🤞 
 +</blockquote> 
 + 
 +Additionally, it may be inconvenient for end users (e.g. new contributors to projects) to remember specifics of multiple libraries or utility classes when working on different codebases, to deal with dependency conflicts after major version upgrades, or to deal with libraries dropping support for older php versions, getting abandoned, etc. 
 + 
 +==== Update: php-ds maintainer response clarifications ==== 
 + 
 +On September 24, 2021, [[https://github.com/php-ds/ext-ds/issues/156#issuecomment-926353779|the maintainer responded]] after being asked about current plans for php-ds 
 + 
 +<blockquote> 
 +Hi everyone, I am happy to see this discussion and I thank you all for taking part. My reservation to merge ds into core has always been because I wanted to make sure we get it right before we do that and the intention behind the mythical v2 was to achieve that, based on learnings from v1 and feedback from the community. I have no personal attachment to this project, I only want what is best for PHP and the community. 
 + 
 +I would love to see a dedicated, super-lean vec data structure in core that has native iteration and all the other same internal benefits as arrays. In my opinion, the API should be very minimal and potentially compatible with all the non-assoc array functions. An OO interface can easily be designed around that. I'm imagining something similar to Golang's slices. 
 + 
 +**As for the future of ds itself, I think these can co-exist and ds can remain external. I've been researching and designing immutable data structures over the last 4 years and I still hope to develop a v2 that simplifies the interfaces and introduces immutable structures. Attempting to implement a suite of structures in core or an OO vector would take a lot of work and might be difficult to reach consensus on with the API. I don't think we should attempt to merge ds into core at any time.** 
 + 
 +I am currently traveling and have not followed this discussion in detail on the mailing list. I'd be happy to assist in any way I can and will catch up as soon as I am home again this week. Feel free to quote this response on the mailing list as well. 
 +</blockquote> 
 + 
 +I'm still awaiting some clarifications on how they they were willing to assist before updating the remainder of this RFC. 
 + 
 +Additionally, there may be differences in design goals, as noted in the above section.
  
 ==== Adding a native type instead (is_vec) ==== ==== Adding a native type instead (is_vec) ====
Line 363: Line 461:
  
 Also, even if a type ''vec'' or ''array'' were added, ''vec'' and ''array'' would be distinct types - a vec couldn't be passed to a parameter that expected an array reference (or returned in a return value), because later adding a string array key (in the parameter or return value) would be a runtime error. Also, even if a type ''vec'' or ''array'' were added, ''vec'' and ''array'' would be distinct types - a vec couldn't be passed to a parameter that expected an array reference (or returned in a return value), because later adding a string array key (in the parameter or return value) would be a runtime error.
 +
 +==== Changelog ====
 +
 +0.2: Add php-ds maintainer response, improve documentation, note this is on hold while working on ''Deque'' (Double-Ended Queue) RFC
  
rfc/vector.1631844630.txt.gz · Last modified: 2021/09/17 02:10 by tandre