rfc:session-read_only-lazy_write

This is an old revision of the document!


PHP RFC: Sessions: Improve original RFC about read_only, lazy_write features

This is a proposal to make an update, or override the decisions taken in the session-lock-ini RFC. Not in the sense of rejecting the RFC, but rather redesign the already accepted solutions, namely the 'read_only' and 'lazy_write' options passed to session_start().

The session-lock-ini has IMHO not been handled or thought out properly. What ended up as accepted from it is mostly unrelated to what it was originally all about (locking options). It (the RFC description) lacks a lot of information and implementation details, that are otherwise part of the patch (but not obvious). This has made it possible for such details to either be overlooked completely or to not receive the deserved attention for something as critical as Sessions.

I feel that there's room for improvement in regards to backwards and forward compatibility, as well as an overall better API design. This is an attempt to make those improvements before there's no coming back from what I believe to be flaws in the initial design.

Introduction

The problems, or missing information on the already accepted solutions are the following:

read_only

1. The name itself is highly misleading, especially for the context in which it is used.

What this option does is to cause session_start() to read the session data, populate the $_SESSION super-global and close the session immediately afterwards. However, nothing in the following code implies such behavior:

session_start(['read_only' => TRUE]);

What any programmer would assume from the above code with no prior knowledge about it, is the already popular meaning of the term "read-only" in computing, or in this context - start a session for which the data is not allowed to be modified. In fact, it is so natural to assume a read-only mode, that I had read the RFC multiple times and yet it took somebody actually telling me that this is a “start-read-and-close” functionality instead.

This could and will lead to a lot of confusion, which in turn will result in a lot of malfunctioning userland code. Assume concurrency with the same session ID in the following example:

// $_SESSION['logged_in'] has been set in a previous request

/* Request 1: */
session_start(['read_only' => TRUE]);

/* Request 2: */
session_start();
unset($_SESSION['logged_in']);
session_write_close();
// Delete temporary DB rows, i.e. cart contents

/* Request 1 */
if ($_SESSION['logged_in']) // evaluates to TRUE
{
// Do something with cart contents, assuming that they exist
}

This is downright dangerous and ext/sessions has always functioned in a way that prevents race conditions, yet it is very easy to fall into this trap.

Furthermore, using the 'read_only' name for 'start-read-and-close' functionality prevents using the same name to implement a read-only session mode in the future. Refer to the future scope section, for details on a possible future implementation of a read-only mode, for which 'read_only' would be the perfect name.

2. It is implemented as an option and not a stand-alone function.

This is somewhat the same issue as described above, but a different API design problem otherwise.

In general, when an option/parameter/argument/whatever is passed to a function, it should act as a mode of operation for that function. As it is currently, passing 'read_only' does not enter some particular mode, but rather makes session_start() to act as an optimized version of the following:

session_start();
session_commit();

Maybe, if session_start() didn't accept mode parameters, that would've been fine. However, session_start() also accepts all session.* INIs + 'lazy_write' and all of those are modes of operation and not additional actions per se. So that makes it not only strange, but also inconsistent.

lazy_write

The issues with this option are non-obvious, more diverse and are mostly related to userland session handlers. Below is the description of how it works, that was missing from the original RFC, and is only evident by the patch and the author's comments during the discussion of this RFC.

Note: PS_UPDATE_TIMESTAMP() is updateTimestamp() in userland, but it was previously called PS_UPDATE_FUNC() (or update()). It was changed for a more descriptive name.

  • When $_SESSION is not changed: call PS_UPDATE_TIMESTAMP_FUNC() instead of PS_WRITE_FUNC().
  • PS_UPDATE_TIMESTAMP_FUNC() only changes the “last modified” timestamp (or a similar timestamp, depending on the storage in use) instead of re-writing all of the data. This is the performance improvement.
  • Userland session handlers can use updateTimestamp() to replicate PS_UPDATE_TIMESTAMP_FUNC().
  • updateTimestamp() is NOT added to the SessionHandler class to maintain BC.
    • This means that userland session handlers extending SessionHandler can't call parent::updateTimestamp().
    • It is made that way because the internal handler that SessionHandler wraps, may or may not have PS_UPDATE_TIMESTAMP_FUNC() (PECL extensions) and this is not detectable at runtime.
  • updateTimestamp() is NOT added to SessionHandlerInterface (to maintain BC), but is added to a new interface called SessionUpdateTimestampInterface.
    • This is because of internal implementation details that were originally designed to just register all of the methods from SessionHandlerInterface, so now you have to use some interface to add new methods.
  • If there's a userland session handler in use:
    • If the handler doesn't have updateTimestamp() - call write() to maintain BC.
    • If the handler has updateTimestamp() - call it.
      • As listed above, a class extending SessionHandler can NOT call parent::updateTimestamp(). It has to call parent::write() to “extend” the internal handler.

And now, the problems with it:

1. You can't call parent::updateTimestamp().

The SessionHandler class was made to expose the internal implementation. To put emphasis on the problem, the following combination beats the whole purpose of having the SessionHandler class:

  • PHP will call SessionHandler::updateTimestamp(), if you have it.
  • You can't call parent::updateTimestamp() from it.

This is an unnecessary limitation, given that only a few PECL extensions wouldn't support the feature, which is a new one and this is not a BC concern.

2. There is little reason for the option (not the feature) to exist in the first place.

It is mostly a performance improvement instead of a functional change. It is somewhat a behavioral change, but still a performance improvement and designed in a way that doesn't break existing applications.

Performance improvements should not be optional, unless they break backwards compatibility and this one is designed to avoid that. Therefore, it should not be optional.

3. No API to detect the option, at all.

'lazy_write' was originally proposed as an INI setting. However, it was reduced to being just a session_start() option because of feedback on the RFC discussion stating that “users don't like INI settings”, or that there are too many INIs already. This is a valid point, but it ignores the fact that INI settings do have a purpose and are in some cases a necessity. For 'lazy_write' to be optional and with no other API available, ini_get() would've been a way for userland code to detect it. Furthermore, with session_start() also accepting existing session.* INI settings, this is yet another inconsistency.

Also, the whole feature was inspired by a feature request for an API to detect if $_SESSION has been changed or not.

Proposal

On 'read_only'

Unofficially, the 'read_only' name should be reserved for what it truly means. For an example, please refer to the Future scope section.

To solve the current problem however, one of the following should be done:

Make it a standalone function

The feature (start, read and close session) should be available through a stand-alone function, named appropriately to describe what it does. In order to be consistent with the new ability for session_start() to accept options (INI or not) in an array, both session_start() and the new function should accept/share the same arguments:

bool function session_start_close(array $options = NULL)

Note: Feedback on a name better than session_start_close() is welcome.

Rename the option

Some people might prefer keeping it as an option instead of a standalone function. If this turns out to be a more popular approach, then the option should be renamed from 'read_only' to either 'read_close' or 'read_and_close'.

On 'lazy_write'

A combination of multiple (but not all) of the following:

Merge updateTimestamp() into write()

SessionHandler::write() should decide whether all data should be written or just the timestamp updated.

This solves all problems:

  • Not adding an unnecessary interface (SessionUpdateTimestampInterface).
  • Maintaining backwards compatibility, even with PECL extensions.
  • Enabling userland code to call parent::methodName().
  • No API changes, at all.

Add SessionHandler::updateTimestamp(), when available

updateTimestamp() can be added to SessionHandler at instantiation time, depending on whether the current session.save_handler supports it. This is of course an alternative to moving its logic to write().

It might not be convenient for some session modules to have the method and others not, but users are in general careful with PECL extensions and of course, it should be properly documented. Anybody could write a PHP extension that provides a session handler that even breaks current functionality - there's no guarantee on that. What can be done however, is to not put unnecessary limits because of third-party extensions. It's a good enough trade-off, IMO.

And this wouldn't be a precedent, many MySQLi functions only exist if mysqlnd support is present.

Add API exposing $_SESSION changes status

This is what was originally asked for via feature request #17860 and would be useful even without 'lazy_write', allowing users to make the performance improvement on their own.

It can be a “magic constant” like the currently existing 'SID'. It can be a function, i.e. session_is_changed(). It can be a property or a method of the SessionHandler class.

Either of the above would be sufficient, it doesn't matter, just as long as the user has a way of accessing that state.

Always do "lazy writes"

There's no reason for this feature to be optional, as explained above - it is a performance improvement and the patch already calls write() if updateTimestamp() doesn't exist. If this RFC is accepted, it would make even less sense to keep it optional. Worst thing that could happen is to fallback to the old behavior where session data is written at all times.

Backward Incompatible Changes

None, the aim is to make the already accepted solution even easier to handle in regards to older versions, given that it is accepted for PHP 5.6.

Proposed PHP Version(s)

PHP 5.6

However, due to Beta 1 approaching, this is subject to change and heavy edits.

Impact to Existing Extensions

ext/session and all extensions providing a session handler.

php.ini Defaults

None.

Open Issues

TBD: Make sure there are no open issues when the vote starts!

Future Scope

Implement the 'read_only' option for session_start() as a read-only mode of the session. In other words: don't allow session data to be written if session_start(['read_only' => TRUE]) was called.

This would include the following characteristics (only when in read-only mode):

  • Use shared(reader) instead of exclusive(writer) locks.
  • Writing is allowed only for creation of new session IDs.
  • API to detect the mode in userland code.

Proposed Voting Choices

Still in dicussion, but will most likely match the sections of proposal at 100%.

Should require 50% + 1 votes.

Patches and Tests

No patch is available at this time, I'll be looking for a volunteer with Yasuo Oghaki being a likely candidate.

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

Rejected Features

TBD: Keep this updated with features that were discussed on the mail lists.

rfc/session-read_only-lazy_write.1395751891.txt.gz · Last modified: 2017/09/22 13:28 (external edit)