Table of Contents

Request for Comments: docBlock Parser

Introduction

The purpose of this RFC is to propose adding functionality to Reflection which will augment Reflection::getDocComment() with a new function that will parse the DocComment string, providing a simple, standardized mechanism of retrieving metadata.

Why do we need metadata?

Metadata is data which provides additional description about a structure. In PHP, this can be used for many purposes:

For other use cases, see also the Annotations RFC.

Why should this functionality be in php core?

  1. There is already a widely used syntax for structured DocBlocks (short description, long description, tags using the @ symbol). This means that there is a de facto standard which PHP can tap into to add functionality painlessly.
  2. As many of the use cases involve frameworks or other software intended to be used by a wide audience, this functionality could not be effectively captured in a PHP extension since frameworks et al. cannot generally rely on a non-core extension.
  3. The Reflection extension already in core is a natural place to put this as a complement to getDocComment().

Common Misconceptions

Comments should not effect PHP's runtime behavior

There is no doubt that in general, commenting code means that it should be ignored by the parser. Despite this, I think there is reason to support parsing docBlocks.

First of all, not all comments become docBlocks. They must precede a structure (class, function, property, method), and they must have the

/** ... */

comment structure.

//
#
/* ... */

are NOT docBlocks.

Therefore, the comments that could potentially effect runtime behavior are isolated to a very specific set. Also, popular convention for docBlocks that already exists will support understanding of this difference.

Furthermore, although not a reason in itself for either accepting or rejecting this proposal, Python's docstrings are parsed along with the object. This establishes that there is at least precedent for comments being integrated into runtime.

Proposal

This proposal is for a function which parses docBlocks and returns an associative array with text key=>value mappings to be added to the Reflection extension via three new methods:

This proposal suggests that whitespace be treated similarly to in HTML (all whitespace gets reduced to a single space character) except that empty lines are significant (see example 1 below for explanation of this rule).

Draft EBNF for docBlock parsing:

docblock          := "/**" , [whitespace] , [short_description] , [long_description] , [tag , { tag | ignored }*] , { emptyline }* , linebreak , "*/" ;
short_description := line , { emptyline }+ ;
long_description  := line , { line | emptyline }* ;
line              := [space] , "* " , [space] , character-"@" , string , [space] , linebreak ;
emptyline         := [space] , "*" , [space] , linebreak ;
tag               := [space] , "* " , "@" , tagname , space , { string | string , linebreak }* , linebreak ;
ignored           := emptyline , { line | emptyline }+ ;
string            := { character }+ ;
tagname           := { character - " " }+ ;
space             := " " | "\t" ;
linebreak         := "\n" | "\r" | "\r\n" ;
character         := ? any ASCII character with code >= 32 and <= 126 ? ;

Examples

1. The following shows a docBlock being parsed via this function:
/**
 * (Short Description)
 *
 * (Long Description ...
 * 
 * long description continues...
 *
 *
 *
 * still long description ...)
 * 
 * @tag1 this is some text for tag1
 * @tag2
 *
 * @tag3 this is some
 *       pretty-indented
 *       multiline text
 *       for tag 3
 *
 * (After tags begin, any text which has an empty line
 * between it and the preceding tag will be completely
 * ignored)
 *
 * @tag4 this is some
 * non-pretty-indented
 * multiline text
 * for tag 4
 *
 * (as above, this is ignored)
 */
class foo {
    // ...
}
 
$r = new ReflectionClass('foo');
var_dump($r->getParsedDocComment());
 
/*
array(3) {
  ["short_description"]=>
  string(19) "(Short Description)"
  ["long_description"]=>
  string(85) "(Long Description ...
 
 long description continues...
 
 
 
still long description ...)"
  ["tags"]=>
  array(4) {
    ["tag1"]=>
    string(26) "this is some text for tag1"
    ["tag2"]=>
    string(0) ""
    ["tag3"]=>
    string(53) "this is some pretty-indented multiline text for tag3"
    ["tag4"]=>
    string(58) "this is some non-pretty-indented multiline text for tag 4"
  }
}
*/
2. The following shows examples of docblocks that could not be parsed:
/**
 * This looks like a docblock, but since it doesn't
 * precede a structure, it's not.
 */
 
// This is a comment ... but not a docblock
class foo() {
}
 
# This is a comment ... but not a docblock
class bar() {
}
 
/*
 * This looks a lot like a docblock, but it's not one
 * because it's opening line does not have two *.
 *
 * @tag this tag couldn't be parsed
 */
class baz() {
}
 
function foobar() {
    /**
     * This looks like a docblock, but since it doesn't
     * precede a structure, it's not.
     */
}

What is a docblock:

A comment block using /** ... */ syntax, preceding a class, function, property, or method. This is the same as is currently used for the getDocComment() function in Reflection.

What will not happen:

Rejected Features

TBD?

BC Breaks

None.

Changelog

2010-09-16 cfulton Initial RFC creation.