lp:~pythonregexp2.7/python/issue2636-08

Created by TimeHorse and last modified

Add support for named POSIX character classes of the form [:class:]. Note, the [] are part of the class definition, and are not in them selves forming a character set. To use a character class it must be included in a character set, e.g. via [[:alphanum:]_], which is equivalent to \w. A character class outside a character set will be interpreted as a non-standard character set and will generate a warning indicating this. Thus, r'[:alpha:]' will match any character in the set([':', 'a', 'h', 'l', 'p']) and generate an error, not [a-zA-Z], as the user might expect. This will be documented. The POSIX and Perl-specific Character Classes are:

alpha -- [A-Za-z]
alnum -- [A-Za-z0-9]
ascii -- All valid printable characters in the ASCII set
blank -- [ \t]
cntrl -- Control Character, [\x00-\x1f\x7f]
digit -- \d
graph -- alphanum + punct (below)
lower -- [a-z]
print -- graph + space (below)
punct -- Any punctuation, e.g. ',', '.', '!', etc.
space -- \s + '\x0b', [\s\x0b]
upper -- [A-Z]
word -- \w (specific to Python / Perl, not part of POSIX)
xdigit -- [0-9a-fA-F]

Note: The Unicode equivalents will be added to each character class where applicable.

Get this branch:
bzr branch lp:~pythonregexp2.7/python/issue2636-08
Members of Python Regexp 2.7 can upload to this branch. Log in for directions.

Branch merges

Related bugs

Related blueprints

Branch information

Owner:
Python Regexp 2.7
Project:
Python
Status:
Development

Recent revisions

39031. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Merged in changes from the core Regexp branch.

39030. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Merged in changes from the core Regexp branch.

39029. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Merged in changes from the core Regexp branch.

39028. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Merged in changes from the core Regexp branch.

39027. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Merged in changes from the core Regexp branch.

39026. By Jeffrey C. Jacobs <email address hidden>

Rolled back the Character Class comment because that will be
useful in this branch.

39025. By Jeffrey C. Jacobs <email address hidden>

Merged in changes from Main Line

39024. By Jeffrey C. Jacobs <email address hidden>

(from original local svn repository): Added ingore directives to not
allow .pyc and .pyo files to be checked in; could not be done under
DOS because there is no valid editor.

Reapplied my edits to sre_parse.py:

Removed unused code for Character Classes -- will be added back in
sub-branch 6 for adding in Character Class support.

39023. By Jeffrey C. Jacobs <email address hidden>

Moved include of sre_parse to the only section of code that actually needs it inside the compile method. Thus, unless compile is called, sre_parse will not unnecessarily be included.

39022. By Jeffrey C. Jacobs <email address hidden>

Mainline merge

Branch metadata

Branch format:
Branch format 6
Repository format:
Bazaar pack repository format 1 with rich root (needs bzr 1.0)
This branch contains Public information 
Everyone can see this information.

Subscribers

No subscribers.