lp:~pythonregexp2.7/python/issue2636-24

Created by TimeHorse and last modified

Currently, the python Regular Expression Engine drops characters when used findall / finditer with an expression that has a Zero-Width capture group. For example:

>>> [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')]
[('', None), (None, 'bc')]

The 'a' has been lost because the engine first matches the (^z*) with zero-width and then consumes the current character (the 'a'). It then proceeds to match the rest of the expression, which it does with (\w+), resulting in 'bc'. The problem is that firstly, the 'a' should not be consumed by the zero-width match (^z*). But, that would lead to infinite matches of zero-width. So, secondly, one would have to give each iteration an internal state that would indicate whether the it would allow a Zero-width match. Initially, any string will match a Zero-Width expression once, but when that same position is entered, the 'Zero-width match' flag would be true and a subsequent Zero-width match would be disallowed. This item is based on the work from Issue 1647489.

Get this branch:
bzr branch lp:~pythonregexp2.7/python/issue2636-24
Members of Python Regexp 2.7 can upload to this branch. Log in for directions.

Branch merges

Related bugs

Related blueprints

Branch information

Owner:
Python Regexp 2.7
Project:
Python
Status:
Development

Recent revisions

39039. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Merged in changes from the latest python source snapshot.

39038. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Modified documentation so the paragraphs would fit in an 80 column
screen by making sure that each line occupies no more than 72 columns.

39037. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Added new, more complex, test for branching (using the OR ('|') operator)
in Regular Expressions.

39036. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Merged in changes from the latest python source snapshot.

39035. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Changed the generic VERBOSE flag to be VERBOSE_SRE_ENGINE so that it can
be defined at the make level without potentially interfering with other
modules.

39034. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Moving these Documentation changes into their own branch so that the minor
changes will not force the documentation suggestion changes to also be
included; they will now only be included in their own branch, for issue 12.

39033. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Replaced tab with spaces.

39032. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Better comment for the end of line test.

39031. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Merged in changes from the latest python source snapshot.

39030. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>

Fixed some spelling mistakes in the test proceedures.

Branch metadata

Branch format:
Branch format 6
Repository format:
Bazaar pack repository format 1 with rich root (needs bzr 1.0)
This branch contains Public information 
Everyone can see this information.

Subscribers