Commit Graph

88 Commits

Author SHA1 Message Date
Martin v. Löwis 78e2f06cc6 Fully support 32-bit codes. Enable BIGCHARSET in UCS-4 builds. 2003-04-19 12:56:08 +00:00
Guido van Rossum 41c99e7f96 SF patch #720991 by Gary Herron:
A small fix for bug #545855 and Greg Chapman's
addition of op code SRE_OP_MIN_REPEAT_ONE for
eliminating recursion on simple uses of pattern '*?' on a
long string.
2003-04-14 17:59:34 +00:00
Fredrik Lundh 09705f0b89 fix for SF #635398 (don't "downcast" return strings from unicode to ascii) 2002-11-22 12:46:35 +00:00
Neal Norwitz addfe0c09c Make private functions static so we don't pollute the namespace 2002-11-10 14:33:26 +00:00
Gustavo Niemeyer c523b04b0f Fixed sre bug "[#581080] Provoking infinite scanner loops".
This bug happened because: 1) the scanner_search and scanner_match methods
were not checking the buffer limits before increasing the current pointer;
and 2) SRE_SEARCH was using "if (ptr == end)" as a loop break, instead of
"if (ptr >= end)".

* Modules/_sre.c
  (SRE_SEARCH): Check for "ptr >= end" to break loops, so that we don't
  hang forever if a pointer passing the buffer limit is used.
  (scanner_search,scanner_match): Don't increment the current pointer
  if we're going to pass the buffer limit.

* Misc/NEWS
  Mention the fix.
2002-11-07 03:28:56 +00:00
Gustavo Niemeyer 4e7be06a65 Fixed bug #470582, using a modified version of patch #527371,
from Greg Chapman.

* Modules/_sre.c
  (lastmark_restore): New function, implementing algorithm to restore
  a state to a given lastmark. In addition to the similar algorithm used
  in a few places of SRE_MATCH, restore lastindex when restoring lastmark.
  (SRE_MATCH): Replace lastmark inline restoring by lastmark_restore(),
  function. Also include it where missing. In SRE_OP_MARK, set lastindex
  only if i > lastmark.

* Lib/test/re_tests.py
* Lib/test/test_sre.py
  Included regression tests for the fixed bugs.

* Misc/NEWS
  Mention fixes.
2002-11-06 14:06:53 +00:00
Michael W. Hudson b6a4505123 Cray fixup as seen in bug #558153. 2002-07-31 09:54:24 +00:00
Mark Hammond 8235ea1c3a Land Patch [ 566100 ] Rationalize DL_IMPORT and DL_EXPORT. 2002-07-19 06:55:41 +00:00
Jeremy Hylton 938ace69a0 staticforward bites the dust.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure.  Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers.  (In
fact, we expect that the compilers are all fixed eight years later.)

I'm leaving staticforward and statichere defined in object.h as
static.  This is only for backwards compatibility with C extensions
that might still use it.

XXX I haven't updated the documentation.
2002-07-17 16:30:39 +00:00
Neal Norwitz 35fc7606f0 SF #561244 Micro optimizations
Convert loops to memset()s.
2002-06-13 21:11:11 +00:00
Neal Norwitz bb2769f580 Revert use of METH_OLDARGS (use 0) to support 1.5.2 2002-03-31 15:46:00 +00:00
Neal Norwitz b049325e92 Use symbolic METH_VARARGS/METH_OLDARGS instead of 1/0 for ml_flags 2002-03-31 14:44:22 +00:00
Fredrik Lundh 82b230732f bug #133283, #477728, #483789, #490573
backed out of broken minimal repeat patch from July

also fixed a couple of minor potential resource leaks in pattern_subx
(Guido had already fixed the big one)
2001-12-09 16:13:15 +00:00
Guido van Rossum 146483964e Patch supplied by Burton Radons for his own SF bug #487390: Modifying
type.__module__ behavior.

This adds the module name and a dot in front of the type name in every
type object initializer, except for built-in types (and those that
already had this).  Note that it touches lots of Mac modules -- I have
no way to test these but the changes look right.  Apologies if they're
not.  This also touches the weakref docs, which contains a sample type
object initializer.  It also touches the mmap test output, because the
mmap type's repr is included in that output.  It touches object.h to
put the correct description in a comment.
2001-12-08 18:02:58 +00:00
Guido van Rossum 4e173846c8 Fix for #489672 (Neil Norwitz): memory leak in test_sre.
(At least for the repeatable test case that Tim produced.)

pattern_subx(): Add missing DECREF(filter) in both exit branches
(normal and error return).  Also fix a DECREF(args) that should
certainly be a DECREF(match) -- because it's inside if (!args) and
right after allocation of match.
2001-12-07 04:25:10 +00:00
Fredrik Lundh 703ce8122c (experimental) "finditer" method/function. this works pretty much
like findall, but returns an iterator (which returns match objects)
instead of a list of strings/tuples.
2001-10-24 22:16:30 +00:00
Fredrik Lundh 6de22ef677 another major speedup: let sre.sub/subn check for escapes in the
template string, and don't call the template compiler if we can
avoid it.
2001-10-22 21:18:08 +00:00
Fredrik Lundh f864aa8fd9 sre.split should return the last segment, even if empty
(sorry, barry)
2001-10-22 06:01:56 +00:00
Fredrik Lundh dac58492aa fixed character set description in docstring (SRE uses Python
strings, not C strings)

removed USE_PYTHON defines, and related sre.py helpers

skip calling the subx helper if the template is callable.
interestingly enough, this means that

	def callback(m):
	    return literal
	result = pattern.sub(callback, string)

is much faster than

	result = pattern.sub(literal, string)
2001-10-21 21:48:30 +00:00
Fredrik Lundh 1296a8d77e sre.Scanner fixes (from Greg Chapman). also added a Scanner sanity
check to the test suite.

added a few missing exception checks in the _sre module
2001-10-21 18:04:11 +00:00
Fredrik Lundh bec95b9d88 rewrote the pattern.sub and pattern.subn methods in C
removed (conceptually flawed) getliteral helper; the new sub/subn code
uses a faster code path for literal replacement strings, but doesn't
(yet) look for literal patterns.

added STATE_OFFSET macro, and use it to convert state.start/ptr to
char indexes
2001-10-21 16:47:57 +00:00
Fredrik Lundh 971e78b55b rewrote the pattern.split method in C
also restored SRE Unicode support for 1.6/2.0/2.1
2001-10-20 17:48:46 +00:00
Fredrik Lundh 397a654791 SRE bug #441409:
compile should raise error for non-strings
SRE bug #432570, 448951:
    reset group after failed match

also bumped version number to 2.2.0
2001-10-18 19:30:16 +00:00
Fredrik Lundh 59b68656f8 fixed #449964: sre.sub raises an exception if the template contains a
\g<x> group reference followed by a character escape

(also restructured a few things on the way to fixing #449000)
2001-09-18 20:55:24 +00:00
Fredrik Lundh 21009b9c6f an SRE bugfix a day keeps Guido away...
#462270: sub-tle difference between pre.sub and sre.sub.  PRE ignored
an empty match at the previous location, SRE didn't.

also synced with Secret Labs "sreopen" codebase.
2001-09-18 18:47:09 +00:00
Sjoerd Mullender 89dfe9e292 Removed unreachable return to silence SGI compiler. 2001-08-30 14:37:07 +00:00
Martin v. Löwis 339d0f720e Patch #445762: Support --disable-unicode
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
2001-08-17 18:39:25 +00:00
Barry Warsaw 214a0b1382 init_sre(): Plug a little leak reported by Insure. 2001-08-16 20:33:48 +00:00
Fredrik Lundh 2d96f11d07 map re.sub() to string.replace(), when possible 2001-07-08 13:26:57 +00:00
Fredrik Lundh d89a2e7731 bug #416670
added copy/deepcopy support to SRE (still not enabled, since it's not
covered by the test suite)
2001-07-03 20:32:36 +00:00
Fredrik Lundh df781e6a3f reapplied darryl gallion's minimizing repeat fix. I'm still not 100%
sure about this one, but test #133283 now works even with the fix in
place, and so does the test suite.  we'll see what comes up...
2001-07-02 19:54:28 +00:00
Fredrik Lundh f71ae461bf pythonware repository roundtrip (untabification) 2001-07-02 17:04:48 +00:00
Fredrik Lundh 19af43d78a added martin's BIGCHARSET patch to SRE 2.1.1. martin reports 2x
speedups for certain unicode character ranges.
2001-07-02 16:58:38 +00:00
Fredrik Lundh b0f05bdfd3 merged with pythonware's SRE 2.1.1 codebase 2001-07-02 16:42:49 +00:00
Fredrik Lundh 9c7eab82b3 SRE: made "copyright" string static, to avoid potential linking
conflicts.
2001-04-15 19:00:58 +00:00
Fredrik Lundh b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Tim Peters 5687ffe0c5 SF patch 404928: Support for next Cygwin gcc (2.95.2-8) 2001-02-28 16:44:18 +00:00
Fredrik Lundh 1c5aa6901f bumped SRE version number to 2.1. cleaned up and added 1.5.2
compatibility patches.
2001-01-16 07:37:30 +00:00
Fredrik Lundh 6f5cba68fc fixed a memory leak in pattern cleanup (patch #103248 by cgw) 2001-01-16 07:05:29 +00:00
Fredrik Lundh b35ffc0417 added "magic" number to the _sre module, to avoid weird errors caused
by compiler/engine mismatches
2001-01-15 12:46:09 +00:00
Fredrik Lundh fa25a7d51f -- don't use recursion for unbounded non-greedy repeat
(bugs #115903, #115696)

This is based on a patch by Darrel Gallion.  I'm not 100%
sure about this fix, but I haven't managed to come up with
any test case it cannot handle...
2001-01-14 23:55:55 +00:00
Fredrik Lundh 770617b23e SRE fixes for 2.1 alpha:
-- added some more docstrings
-- fixed typo in scanner class (#125531)
-- the multiline flag (?m) should't affect the \Z operator (#127259)
-- fixed non-greedy backtracking bug (#123769, #127259)
-- added sre.DEBUG flag (currently dumps the parsed pattern structure)
-- fixed a couple of glitches in groupdict (the #126587 memory leak
   had already been fixed by AMK)
2001-01-14 15:06:11 +00:00
Andrew M. Kuchling 48f224c877 Fix bug 126587: matchobject.groupdict() leaks memory because of a missing
DECREF
2000-12-22 14:39:10 +00:00
Fredrik Lundh ebc37b28fa -- properly reset groups in findall (bug #117612)
-- fixed negative lookbehind to work correctly at the beginning
of the target string (bug #117242)

-- improved syntax check; you can no longer refer to a group
inside itself (bug #110866)
2000-10-28 19:30:41 +00:00
Fredrik Lundh 562586eb3a Accept keyword arguments for (most) pattern and match object
methods.  Closes buglet #115845.
2000-10-03 20:43:34 +00:00
Fredrik Lundh 65d4bc616a Fixed negative lookahead/lookbehind. Closes bug #115618. 2000-10-03 16:29:23 +00:00
Fred Drake d5fadf75e4 Rationalize use of limits.h, moving the inclusion to Python.h.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.

This closes SourceForge patch #101659 and bug #115323.
2000-09-26 05:46:01 +00:00
Fredrik Lundh 5644b7fad1 - fixed yet another gcc -pedantic warning
- added experimental "expand" method to match objects
- don't use the buffer interface on unicode strings
2000-09-21 17:03:25 +00:00
Fredrik Lundh 510c97ba2f return -1 for undefined groups (as implemented in 1.5.2) instead of
None (as documented) from start/end/span.  closes bug #113254
2000-09-02 16:36:57 +00:00
Fredrik Lundh e67d8e514f oops. accidentally reintroduced a memory leak. put the bugfix back. 2000-08-27 21:32:46 +00:00