rfc:source_files_without_opening_tag

This is an old revision of the document!


Request for Comments: Source Files Without Opening Tag

This RFC proposes a way to support source code files without <?php at the top.

Introduction

The purpose of this RFC is to provide a way to support source files that do not begin with <?php while maintaining backwards compatibility.

Why is this desirable?

In modern framework development and larger projects in general, it is often considered good practice to implement PHP classes in files which contain only PHP code and typically contain no “raw HTML” (no code which is not wrapped by <?php and ?>), at least not outside the context of a function or method. In such files, typing <?php at the top is:

1. Error-prone in a subtle and hard-to-debug way: if any whitespace is introduced before <?php, the code still runs, but your XHTML doctype fails to be recognized, your header() calls fail, etc. Since you may not use these features in every situation the bug is often not spotted until an inopportune time.

2. Tedious. There is a small but real frustration involved in this redundancy. Small but real frustrations can contribute to long-term disenchantment with a programming language.

However these same projects and frameworks may advocate the use of “raw HTML” in PHP files intended as templates for rendering pages, forms and the like. This is a longstanding feature of PHP (indeed the original feature of PHP). Support for it should be maintained, and may perhaps be improved in future to address PHP's current limitations as a templating language. This proposal aims not to close any doors in this regard.

Proposal

Part 1: Enhance the include, include_once, require and require_once keywords

These keywords will be enhanced with a second, optional parameter.

The first parameter (the URL/filename to the file to be included) does not change.

The second parameter is a combination of integer flags, combined in the usual way with the OR operator (|).

If this second parameter is absent, the four keywords behave exactly as they do now.

When the second parameter is present, it may be a bitwise OR of zero or more of the following constants which add to (but never subtract from) the existing behavior of each keyword:

If INCLUDE_PURE_CODE is present, the parser begins reading the included file as if the <?php tag had already been encountered, and any occurrence of the ?> and <?php tokens later in that file is a fatal error. This rule does NOT extend to other files included and/or required later. Files required in INCLUDE_PURE_CODE mode can still require template files that do contain <?php and ?>.

If INCLUDE_ONCE is present or the include_once or require_once keyword was used, the file is not included if it has already been included once (like the normal behavior of include_once and require_once). Note that the use of either of the _once keywords implicitly turns on this bit regardless.

If INCLUDE_ERROR_ON_FAILURE is present, or the require or require_once keyword was used, an E_COMPILE_ERROR fatal error is generated if the file cannot be included (exactly like a failure of the require keyword). Otherwise an E_WARNING is generated, as is normal for the include keyword with no second parameter. Note that the use of either of the require_ keywords implicitly turns on this bit regardless.

Examples:

// Absolutely no change to existing behavior
require 'filename.php';

// Load filename.phpp. This file must consist purely of source code, no <?php or ?> tokens needed or permitted
require 'filename.phpp', INCLUDE_PURE_CODE;
 
// Behaves just like include_once
include 'filename.php', INCLUDE_ONCE;
 
// Behaves just like require
include 'template.php', INCLUDE_ERROR_ON_FAILURE;
 
// Combine them all: includes only once, with a fatal error on failure, parsing in "code mode"
include 'filename.phpp', INCLUDE_PURE_CODE | INCLUDE_ONCE | INCLUDE_ERROR_ON_FAILURE;
 
// Exactly the same as previous example
require_once 'filename.phpp', INCLUDE_CODE_FIRST;

Part 2: Filename Convention

Although this proposal gives implementers flexibility in when and where they use the INCLUDE_CODE_FIRST bit, it is still desirable in most cases to have a commonly recognized convention to distinguish files that should be read starting in “PHP mode” from legacy and template files that should be read starting in “HTML mode.” The following convention is proposed for environments in which file extensions are a relevant and useful concept:

  • Files that should be read starting in HTML mode should have a .php extension, for backwards compatibility.
  • Files that should be read starting in PHP mode should have a .phpp extension (short for “Pure PHP”).

However enforcement of this convention is NOT proposed. The choice to apply INCLUDE_CODE_FIRST is made entirely by the programmer (typically the author of a class file autoloader).

Anticipated And Previously Raised Questions

(Thanks to those who raised and responded to some of these questions already on the internals list. I am summarizing in many cases.)

“Does this break my existing code?”

No. Code that never uses the new keyword will not be affected in any way. The proposal allows autoloaders to load files the old-fashioned way and to recognize when to do so by a simple common convention or by other local conventions as appropriate.

“Isn't the require_path keyword even more work than typing <?php?”

Typically projects that will benefit from the code option to require_path also have autoloaders to load classes implicitly when they are first used. So require_path would be typed once in the autoloader, not many times everywhere.

“Won't this slow down the autoloader?”

Not really. Even in a worst-case scenario where stat() calls are slow and the autoloader performs no caching even in a production environment, the autoloader will often be able to assume that only .phpc files are expected because that is the convention of the library or framework from which they came, so it won't be necessary to stat() first for .phpc and then check for .php as well. It is also possible to prewarm autoloader caches as part of deployment.

“Won't this break if you try to use the code with an older version of PHP?”

Of course. A choice to use this feature implies a choice to support only the supporting version of PHP or newer. But it'll break cleanly with a clear error message, just like code that tries to use traits or other newer features. That is one of the advantages of using new keywords rather than applying special behavior to file extensions automatically or similar.

“Why doesn't the proposal forbid the use of ?> to get back to HTML mode in a .phpc file?”

Two reasons:

1. While it is tempting to be a purist about this, those who are creating code generators and the like may sometimes find it useful to leverage HTML mode in a class file (though they also have the option of requiring a regular .php file). It is not the place of this proposal to forbid them from doing what they want if it doesn't interfere with the ability of others to write maintainable, bug-free code more conveniently. Some may find it unsettling to use ?> where you have not explicitly used <?php, but they don't have to use this capability.

2. It is suspected that the implementation of the proposal will be simpler and therefore easier to debug and maintain if it is limited simply to setting the initial state of the parser.

“Why introduce a keyword that takes an array parameter instead of new keywords?”

Using new keywords would require doubling the number of variations on require from four to eight. Other proposals being aired to alter and enhance PHP's functionality as a template language would increase this number even more. An associative array is a sustainable solution which will potentially allow for other enhancements if approved. And since this keyword will primarily be used in autoloaders it does not represent a lot of tedious typing in many places (although the new short array syntax already included in PHP 5.4 would help if you did choose to use this keyword frequently).

Changelog

  • 2011-04-09 Yasuo Ohgaki: Added related RFC.
  • 2011-04-10 Thomas Boutell: removed misleading word “Option” from parts 1 and 2, which are not meant to be mutually exclusive (see the original text).
rfc/source_files_without_opening_tag.1334100660.txt.gz · Last modified: 2017/09/22 13:28 (external edit)