lp:~pythonregexp2.7/python/issue2636-24
Currently, the python Regular Expression Engine drops characters when used findall / finditer with an expression that has a Zero-Width capture group. For example:
>>> [m.groups() for m in re.finditer(
[('', None), (None, 'bc')]
The 'a' has been lost because the engine first matches the (^z*) with zero-width and then consumes the current character (the 'a'). It then proceeds to match the rest of the expression, which it does with (\w+), resulting in 'bc'. The problem is that firstly, the 'a' should not be consumed by the zero-width match (^z*). But, that would lead to infinite matches of zero-width. So, secondly, one would have to give each iteration an internal state that would indicate whether the it would allow a Zero-width match. Initially, any string will match a Zero-Width expression once, but when that same position is entered, the 'Zero-width match' flag would be true and a subsequent Zero-width match would be disallowed. This item is based on the work from Issue 1647489.
- Get this branch:
- bzr branch lp:~pythonregexp2.7/python/issue2636-24
Branch merges
Branch information
Recent revisions
- 39039. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>
-
Merged in changes from the latest python source snapshot.
- 39038. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>
-
Modified documentation so the paragraphs would fit in an 80 column
screen by making sure that each line occupies no more than 72 columns. - 39037. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>
-
Added new, more complex, test for branching (using the OR ('|') operator)
in Regular Expressions. - 39036. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>
-
Merged in changes from the latest python source snapshot.
- 39035. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>
-
Changed the generic VERBOSE flag to be VERBOSE_SRE_ENGINE so that it can
be defined at the make level without potentially interfering with other
modules. - 39034. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>
-
Moving these Documentation changes into their own branch so that the minor
changes will not force the documentation suggestion changes to also be
included; they will now only be included in their own branch, for issue 12. - 39032. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>
-
Better comment for the end of line test.
- 39031. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>
-
Merged in changes from the latest python source snapshot.
- 39030. By Jeffrey C. "The TimeHorse" Jacobs <email address hidden>
-
Fixed some spelling mistakes in the test proceedures.
Branch metadata
- Branch format:
- Branch format 6
- Repository format:
- Bazaar pack repository format 1 with rich root (needs bzr 1.0)