Commit Graph

101 Commits

Author SHA1 Message Date
Gustavo Niemeyer 0f0c06a5c2 Removing dead code. 2003-10-18 20:54:44 +00:00
Gustavo Niemeyer ad3fc44ccb Implemented non-recursive SRE matching. 2003-10-17 22:13:16 +00:00
Raymond Hettinger 8ae4689657 Simplify and speedup uses of Py_BuildValue():
* Py_BuildValue("(OOO)",a,b,c)  -->  PyTuple_Pack(3,a,b,c)
* Py_BuildValue("()",a)         -->  PyTuple_New(0)
* Py_BuildValue("O", a)         -->  Py_INCREF(a)
2003-10-12 19:09:37 +00:00
Gustavo Niemeyer 28b5bb33ea Fixing bug described in patch #756032, where SRE reads invalid data
due to a corrupted end pointer.
2003-06-26 14:41:08 +00:00
Andrew MacIntyre 1a44448b24 Changes to sre.c after the application of patch #726869 have increased
stack usage on FreeBSD, requiring the recursion limit to be lowered
further.  Building with gcc 2.95 (the standard compiler on FreeBSD 4.x)
is now also affected.

The underlying issue is that FreeBSD's pthreads implementation has a
hard-coded 1MB stack size for the initial (or "primary") thread, which
can not be changed without rebuilding libc_r.  Exhausting this stack
results in a bus error.

Building without pthreads (configure --without-threads), or linking
with the port of the Linux pthreads library (aka Linuxthreads) instead
of libc_r, avoids this limitation.

On OS/2, only gcc 3.2 is affected and the stack size is controllable,
so the special handling has been removed.
2003-06-09 08:22:11 +00:00
Andrew M. Kuchling c24fe36c57 Allow _sre.c to compile with Python 2.2 2003-04-30 13:09:08 +00:00
Gustavo Niemeyer caf1c9dfe7 - Included detailed documentation in _sre.c explaining how, when, and why
to use LASTMARK_SAVE()/LASTMARK_RESTORE(), based on the discussion
  in patch #712900.

- Cleaned up LASTMARK_SAVE()/LASTMARK_RESTORE() usage, based on the
  established rules.

- Moved the upper part of the just commited patch (relative to bug #725106)
  to outside the for() loop of BRANCH OP. There's no need to mark_save()
  in every loop iteration.
2003-04-27 14:42:54 +00:00
Gustavo Niemeyer 3646ab98af Fix for part of the problem mentioned in #725149 by Greg Chapman.
This problem is related to a wrong behavior from mark_save/restore(),
which don't restore the mark_stack_base before restoring the marks.
Greg's suggestion was to change the asserts, which happen to be
the only recursive ops that can continue the loop, but the problem would
happen to any operation with the same behavior. So, rather than
hardcoding this into asserts, I have changed mark_save/restore() to
always restore the stackbase before restoring the marks.

Both solutions should fix these two cases, presented by Greg:

>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
('b', None)
>>> re.match('(a)((?!(b)*))*', 'abb').groups()
('b', None, None)

The rest of the bug and patch in #725149 must be discussed further.
2003-04-27 13:25:21 +00:00
Gustavo Niemeyer c34f2555bd Applied patch #725106, by Greg Chapman, fixing capturing groups
within repeats of alternatives. The only change to the original
patch was to convert the tests to the new test_re.py file.

This patch fixes cases like:

>>> re.match('((a)|b)*', 'abc').groups()
('b', '')

Which is wrong (it's impossible to match the empty string),
and incompatible with other regex systems, like the following
examples show:

% perl -e '"abc" =~ /^((a)|b)*/; print "$1 $2\n";'
b a

% echo "abc" | sed -r -e "s/^((a)|b)*/\1 \2|/"
b a|c
2003-04-27 12:34:14 +00:00
Gustavo Niemeyer c23fb77477 Applying patch #726869 by Andrew I MacIntyre, reducing in _sre.c the
recursion limit for certain setups of FreeBSD and OS/2.
2003-04-27 06:58:54 +00:00
Gustavo Niemeyer 3c9068bbec Made MAX_UNTIL/MIN_UNTIL code more coherent about mark protection,
accordingly to further discussions with Greg Chapman in patch #712900.
2003-04-22 15:39:09 +00:00
Gustavo Niemeyer be733ee7fb More work on bug #672491 and patch #712900.
I've applied a modified version of Greg Chapman's patch. I've included
the fixes without introducing the reorganization mentioned, for the sake
of stability. Also, the second fix mentioned in the patch don't fix the
mentioned problem anymore, because of the change introduced by patch
#720991 (by Greg as well). The new fix wasn't complicated though, and is
included as well.

As a note. It seems that there are other places that require the
"protection" of LASTMARK_SAVE()/LASTMARK_RESTORE(), and are just waiting
for someone to find how to break them. Particularly, I belive that every
recursion of SRE_MATCH() should be protected by these macros. I won't
do that right now since I'm not completely sure about this, and we don't
have much time for testing until the next release.
2003-04-20 07:35:44 +00:00
Gustavo Niemeyer 1aca359e89 - Fixed bug #672491. This change restores the behavior of lastindex/lastgroup
to be compliant with previous python versions, by backing out the changes
  made in revision 2.84 which affected this. The bugfix for backtracking is
  still maintained.
2003-04-20 00:45:13 +00:00
Martin v. Löwis 78e2f06cc6 Fully support 32-bit codes. Enable BIGCHARSET in UCS-4 builds. 2003-04-19 12:56:08 +00:00
Guido van Rossum 41c99e7f96 SF patch #720991 by Gary Herron:
A small fix for bug #545855 and Greg Chapman's
addition of op code SRE_OP_MIN_REPEAT_ONE for
eliminating recursion on simple uses of pattern '*?' on a
long string.
2003-04-14 17:59:34 +00:00
Fredrik Lundh 09705f0b89 fix for SF #635398 (don't "downcast" return strings from unicode to ascii) 2002-11-22 12:46:35 +00:00
Neal Norwitz addfe0c09c Make private functions static so we don't pollute the namespace 2002-11-10 14:33:26 +00:00
Gustavo Niemeyer c523b04b0f Fixed sre bug "[#581080] Provoking infinite scanner loops".
This bug happened because: 1) the scanner_search and scanner_match methods
were not checking the buffer limits before increasing the current pointer;
and 2) SRE_SEARCH was using "if (ptr == end)" as a loop break, instead of
"if (ptr >= end)".

* Modules/_sre.c
  (SRE_SEARCH): Check for "ptr >= end" to break loops, so that we don't
  hang forever if a pointer passing the buffer limit is used.
  (scanner_search,scanner_match): Don't increment the current pointer
  if we're going to pass the buffer limit.

* Misc/NEWS
  Mention the fix.
2002-11-07 03:28:56 +00:00
Gustavo Niemeyer 4e7be06a65 Fixed bug #470582, using a modified version of patch #527371,
from Greg Chapman.

* Modules/_sre.c
  (lastmark_restore): New function, implementing algorithm to restore
  a state to a given lastmark. In addition to the similar algorithm used
  in a few places of SRE_MATCH, restore lastindex when restoring lastmark.
  (SRE_MATCH): Replace lastmark inline restoring by lastmark_restore(),
  function. Also include it where missing. In SRE_OP_MARK, set lastindex
  only if i > lastmark.

* Lib/test/re_tests.py
* Lib/test/test_sre.py
  Included regression tests for the fixed bugs.

* Misc/NEWS
  Mention fixes.
2002-11-06 14:06:53 +00:00
Michael W. Hudson b6a4505123 Cray fixup as seen in bug #558153. 2002-07-31 09:54:24 +00:00
Mark Hammond 8235ea1c3a Land Patch [ 566100 ] Rationalize DL_IMPORT and DL_EXPORT. 2002-07-19 06:55:41 +00:00
Jeremy Hylton 938ace69a0 staticforward bites the dust.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure.  Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers.  (In
fact, we expect that the compilers are all fixed eight years later.)

I'm leaving staticforward and statichere defined in object.h as
static.  This is only for backwards compatibility with C extensions
that might still use it.

XXX I haven't updated the documentation.
2002-07-17 16:30:39 +00:00
Neal Norwitz 35fc7606f0 SF #561244 Micro optimizations
Convert loops to memset()s.
2002-06-13 21:11:11 +00:00
Neal Norwitz bb2769f580 Revert use of METH_OLDARGS (use 0) to support 1.5.2 2002-03-31 15:46:00 +00:00
Neal Norwitz b049325e92 Use symbolic METH_VARARGS/METH_OLDARGS instead of 1/0 for ml_flags 2002-03-31 14:44:22 +00:00
Fredrik Lundh 82b230732f bug #133283, #477728, #483789, #490573
backed out of broken minimal repeat patch from July

also fixed a couple of minor potential resource leaks in pattern_subx
(Guido had already fixed the big one)
2001-12-09 16:13:15 +00:00
Guido van Rossum 146483964e Patch supplied by Burton Radons for his own SF bug #487390: Modifying
type.__module__ behavior.

This adds the module name and a dot in front of the type name in every
type object initializer, except for built-in types (and those that
already had this).  Note that it touches lots of Mac modules -- I have
no way to test these but the changes look right.  Apologies if they're
not.  This also touches the weakref docs, which contains a sample type
object initializer.  It also touches the mmap test output, because the
mmap type's repr is included in that output.  It touches object.h to
put the correct description in a comment.
2001-12-08 18:02:58 +00:00
Guido van Rossum 4e173846c8 Fix for #489672 (Neil Norwitz): memory leak in test_sre.
(At least for the repeatable test case that Tim produced.)

pattern_subx(): Add missing DECREF(filter) in both exit branches
(normal and error return).  Also fix a DECREF(args) that should
certainly be a DECREF(match) -- because it's inside if (!args) and
right after allocation of match.
2001-12-07 04:25:10 +00:00
Fredrik Lundh 703ce8122c (experimental) "finditer" method/function. this works pretty much
like findall, but returns an iterator (which returns match objects)
instead of a list of strings/tuples.
2001-10-24 22:16:30 +00:00
Fredrik Lundh 6de22ef677 another major speedup: let sre.sub/subn check for escapes in the
template string, and don't call the template compiler if we can
avoid it.
2001-10-22 21:18:08 +00:00
Fredrik Lundh f864aa8fd9 sre.split should return the last segment, even if empty
(sorry, barry)
2001-10-22 06:01:56 +00:00
Fredrik Lundh dac58492aa fixed character set description in docstring (SRE uses Python
strings, not C strings)

removed USE_PYTHON defines, and related sre.py helpers

skip calling the subx helper if the template is callable.
interestingly enough, this means that

	def callback(m):
	    return literal
	result = pattern.sub(callback, string)

is much faster than

	result = pattern.sub(literal, string)
2001-10-21 21:48:30 +00:00
Fredrik Lundh 1296a8d77e sre.Scanner fixes (from Greg Chapman). also added a Scanner sanity
check to the test suite.

added a few missing exception checks in the _sre module
2001-10-21 18:04:11 +00:00
Fredrik Lundh bec95b9d88 rewrote the pattern.sub and pattern.subn methods in C
removed (conceptually flawed) getliteral helper; the new sub/subn code
uses a faster code path for literal replacement strings, but doesn't
(yet) look for literal patterns.

added STATE_OFFSET macro, and use it to convert state.start/ptr to
char indexes
2001-10-21 16:47:57 +00:00
Fredrik Lundh 971e78b55b rewrote the pattern.split method in C
also restored SRE Unicode support for 1.6/2.0/2.1
2001-10-20 17:48:46 +00:00
Fredrik Lundh 397a654791 SRE bug #441409:
compile should raise error for non-strings
SRE bug #432570, 448951:
    reset group after failed match

also bumped version number to 2.2.0
2001-10-18 19:30:16 +00:00
Fredrik Lundh 59b68656f8 fixed #449964: sre.sub raises an exception if the template contains a
\g<x> group reference followed by a character escape

(also restructured a few things on the way to fixing #449000)
2001-09-18 20:55:24 +00:00
Fredrik Lundh 21009b9c6f an SRE bugfix a day keeps Guido away...
#462270: sub-tle difference between pre.sub and sre.sub.  PRE ignored
an empty match at the previous location, SRE didn't.

also synced with Secret Labs "sreopen" codebase.
2001-09-18 18:47:09 +00:00
Sjoerd Mullender 89dfe9e292 Removed unreachable return to silence SGI compiler. 2001-08-30 14:37:07 +00:00
Martin v. Löwis 339d0f720e Patch #445762: Support --disable-unicode
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
2001-08-17 18:39:25 +00:00
Barry Warsaw 214a0b1382 init_sre(): Plug a little leak reported by Insure. 2001-08-16 20:33:48 +00:00
Fredrik Lundh 2d96f11d07 map re.sub() to string.replace(), when possible 2001-07-08 13:26:57 +00:00
Fredrik Lundh d89a2e7731 bug #416670
added copy/deepcopy support to SRE (still not enabled, since it's not
covered by the test suite)
2001-07-03 20:32:36 +00:00
Fredrik Lundh df781e6a3f reapplied darryl gallion's minimizing repeat fix. I'm still not 100%
sure about this one, but test #133283 now works even with the fix in
place, and so does the test suite.  we'll see what comes up...
2001-07-02 19:54:28 +00:00
Fredrik Lundh f71ae461bf pythonware repository roundtrip (untabification) 2001-07-02 17:04:48 +00:00
Fredrik Lundh 19af43d78a added martin's BIGCHARSET patch to SRE 2.1.1. martin reports 2x
speedups for certain unicode character ranges.
2001-07-02 16:58:38 +00:00
Fredrik Lundh b0f05bdfd3 merged with pythonware's SRE 2.1.1 codebase 2001-07-02 16:42:49 +00:00
Fredrik Lundh 9c7eab82b3 SRE: made "copyright" string static, to avoid potential linking
conflicts.
2001-04-15 19:00:58 +00:00
Fredrik Lundh b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Tim Peters 5687ffe0c5 SF patch 404928: Support for next Cygwin gcc (2.95.2-8) 2001-02-28 16:44:18 +00:00