Commit Graph

212 Commits

Author SHA1 Message Date
Tim Peters 3d56350910 _compile(): raise an exception if downcasting to SRE_CODE
loses information:

    OverflowError: regular expression code size limit exceeded

Otherwise the compiled code is gibberish, possibly leading at
least to wrong results or (as reported on c.l.py) internal
sre errors at match time.

I'm not sure how to test this.  SRE_CODE is a 2-byte type on
my box, and it's easy to create a regexp that causes the new
exception to trigger here.  But it may be a 4-byte type on
other boxes, and creating a regexp large enough to trigger
problems there would be pretty crazy.

Bugfix candidate.
2006-01-21 02:47:53 +00:00
Neal Norwitz 1ac754fa10 Check return result from Py_InitModule*(). This API can fail.
Probably should be backported.
2006-01-19 06:09:39 +00:00
Jeremy Hylton af68c874a6 Add const to several API functions that take char *.
In C++, it's an error to pass a string literal to a char* function
without a const_cast().  Rather than require every C++ extension
module to put a cast around string literals, fix the API to state the
const-ness.

I focused on parts of the API where people usually pass literals:
PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type
slots, etc.  Predictably, there were a large set of functions that
needed to be fixed as a result of these changes.  The most pervasive
change was to make the keyword args list passed to
PyArg_ParseTupleAndKewords() to be a const char *kwlist[].

One cast was required as a result of the changes:  A type object
mallocs the memory for its tp_doc slot and later frees it.
PyTypeObject says that tp_doc is const char *; but if the type was
created by type_new(), we know it is safe to cast to char *.
2005-12-10 18:50:16 +00:00
Gustavo Niemeyer 166878f544 Fixing bug #1072259 in SRE. 2004-12-02 16:15:39 +00:00
Raymond Hettinger 9447874131 Add docstrings for regular expression objects and methods. 2004-09-24 04:31:19 +00:00
Gustavo Niemeyer 0506c64086 Fixing bug #817234, which made SRE get into an infinite loop on
empty final matches with finditer(). New test cases included
for this bug and for #581080.
2004-09-03 18:11:59 +00:00
Nicholas Bastin 9ba301e589 Moved SunPro warning suppression into pyport.h and out of individual
modules and objects.
2004-07-15 15:54:05 +00:00
Nicholas Bastin 1ce9e4cfc1 Fixed end-of-loop code not reached warning when using SunPro C 2004-06-17 18:27:18 +00:00
Raymond Hettinger 027bb633b6 Add weakref support to sockets and re pattern objects. 2004-05-31 03:09:25 +00:00
Gustavo Niemeyer 601b963be0 - Fixing annoying warnings. 2004-02-14 00:31:13 +00:00
Gustavo Niemeyer 2cbdc2a461 Cleaning up recursive pieces left in the reorganization. 2003-12-13 20:32:08 +00:00
Gustavo Niemeyer 0f0c06a5c2 Removing dead code. 2003-10-18 20:54:44 +00:00
Gustavo Niemeyer ad3fc44ccb Implemented non-recursive SRE matching. 2003-10-17 22:13:16 +00:00
Raymond Hettinger 8ae4689657 Simplify and speedup uses of Py_BuildValue():
* Py_BuildValue("(OOO)",a,b,c)  -->  PyTuple_Pack(3,a,b,c)
* Py_BuildValue("()",a)         -->  PyTuple_New(0)
* Py_BuildValue("O", a)         -->  Py_INCREF(a)
2003-10-12 19:09:37 +00:00
Gustavo Niemeyer 28b5bb33ea Fixing bug described in patch #756032, where SRE reads invalid data
due to a corrupted end pointer.
2003-06-26 14:41:08 +00:00
Andrew MacIntyre 1a44448b24 Changes to sre.c after the application of patch #726869 have increased
stack usage on FreeBSD, requiring the recursion limit to be lowered
further.  Building with gcc 2.95 (the standard compiler on FreeBSD 4.x)
is now also affected.

The underlying issue is that FreeBSD's pthreads implementation has a
hard-coded 1MB stack size for the initial (or "primary") thread, which
can not be changed without rebuilding libc_r.  Exhausting this stack
results in a bus error.

Building without pthreads (configure --without-threads), or linking
with the port of the Linux pthreads library (aka Linuxthreads) instead
of libc_r, avoids this limitation.

On OS/2, only gcc 3.2 is affected and the stack size is controllable,
so the special handling has been removed.
2003-06-09 08:22:11 +00:00
Andrew M. Kuchling c24fe36c57 Allow _sre.c to compile with Python 2.2 2003-04-30 13:09:08 +00:00
Gustavo Niemeyer caf1c9dfe7 - Included detailed documentation in _sre.c explaining how, when, and why
to use LASTMARK_SAVE()/LASTMARK_RESTORE(), based on the discussion
  in patch #712900.

- Cleaned up LASTMARK_SAVE()/LASTMARK_RESTORE() usage, based on the
  established rules.

- Moved the upper part of the just commited patch (relative to bug #725106)
  to outside the for() loop of BRANCH OP. There's no need to mark_save()
  in every loop iteration.
2003-04-27 14:42:54 +00:00
Gustavo Niemeyer 3646ab98af Fix for part of the problem mentioned in #725149 by Greg Chapman.
This problem is related to a wrong behavior from mark_save/restore(),
which don't restore the mark_stack_base before restoring the marks.
Greg's suggestion was to change the asserts, which happen to be
the only recursive ops that can continue the loop, but the problem would
happen to any operation with the same behavior. So, rather than
hardcoding this into asserts, I have changed mark_save/restore() to
always restore the stackbase before restoring the marks.

Both solutions should fix these two cases, presented by Greg:

>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
('b', None)
>>> re.match('(a)((?!(b)*))*', 'abb').groups()
('b', None, None)

The rest of the bug and patch in #725149 must be discussed further.
2003-04-27 13:25:21 +00:00
Gustavo Niemeyer c34f2555bd Applied patch #725106, by Greg Chapman, fixing capturing groups
within repeats of alternatives. The only change to the original
patch was to convert the tests to the new test_re.py file.

This patch fixes cases like:

>>> re.match('((a)|b)*', 'abc').groups()
('b', '')

Which is wrong (it's impossible to match the empty string),
and incompatible with other regex systems, like the following
examples show:

% perl -e '"abc" =~ /^((a)|b)*/; print "$1 $2\n";'
b a

% echo "abc" | sed -r -e "s/^((a)|b)*/\1 \2|/"
b a|c
2003-04-27 12:34:14 +00:00
Gustavo Niemeyer c23fb77477 Applying patch #726869 by Andrew I MacIntyre, reducing in _sre.c the
recursion limit for certain setups of FreeBSD and OS/2.
2003-04-27 06:58:54 +00:00
Gustavo Niemeyer 3c9068bbec Made MAX_UNTIL/MIN_UNTIL code more coherent about mark protection,
accordingly to further discussions with Greg Chapman in patch #712900.
2003-04-22 15:39:09 +00:00
Gustavo Niemeyer be733ee7fb More work on bug #672491 and patch #712900.
I've applied a modified version of Greg Chapman's patch. I've included
the fixes without introducing the reorganization mentioned, for the sake
of stability. Also, the second fix mentioned in the patch don't fix the
mentioned problem anymore, because of the change introduced by patch
#720991 (by Greg as well). The new fix wasn't complicated though, and is
included as well.

As a note. It seems that there are other places that require the
"protection" of LASTMARK_SAVE()/LASTMARK_RESTORE(), and are just waiting
for someone to find how to break them. Particularly, I belive that every
recursion of SRE_MATCH() should be protected by these macros. I won't
do that right now since I'm not completely sure about this, and we don't
have much time for testing until the next release.
2003-04-20 07:35:44 +00:00
Gustavo Niemeyer 1aca359e89 - Fixed bug #672491. This change restores the behavior of lastindex/lastgroup
to be compliant with previous python versions, by backing out the changes
  made in revision 2.84 which affected this. The bugfix for backtracking is
  still maintained.
2003-04-20 00:45:13 +00:00
Martin v. Löwis 78e2f06cc6 Fully support 32-bit codes. Enable BIGCHARSET in UCS-4 builds. 2003-04-19 12:56:08 +00:00
Guido van Rossum 41c99e7f96 SF patch #720991 by Gary Herron:
A small fix for bug #545855 and Greg Chapman's
addition of op code SRE_OP_MIN_REPEAT_ONE for
eliminating recursion on simple uses of pattern '*?' on a
long string.
2003-04-14 17:59:34 +00:00
Fredrik Lundh 09705f0b89 fix for SF #635398 (don't "downcast" return strings from unicode to ascii) 2002-11-22 12:46:35 +00:00
Neal Norwitz addfe0c09c Make private functions static so we don't pollute the namespace 2002-11-10 14:33:26 +00:00
Gustavo Niemeyer c523b04b0f Fixed sre bug "[#581080] Provoking infinite scanner loops".
This bug happened because: 1) the scanner_search and scanner_match methods
were not checking the buffer limits before increasing the current pointer;
and 2) SRE_SEARCH was using "if (ptr == end)" as a loop break, instead of
"if (ptr >= end)".

* Modules/_sre.c
  (SRE_SEARCH): Check for "ptr >= end" to break loops, so that we don't
  hang forever if a pointer passing the buffer limit is used.
  (scanner_search,scanner_match): Don't increment the current pointer
  if we're going to pass the buffer limit.

* Misc/NEWS
  Mention the fix.
2002-11-07 03:28:56 +00:00
Gustavo Niemeyer 4e7be06a65 Fixed bug #470582, using a modified version of patch #527371,
from Greg Chapman.

* Modules/_sre.c
  (lastmark_restore): New function, implementing algorithm to restore
  a state to a given lastmark. In addition to the similar algorithm used
  in a few places of SRE_MATCH, restore lastindex when restoring lastmark.
  (SRE_MATCH): Replace lastmark inline restoring by lastmark_restore(),
  function. Also include it where missing. In SRE_OP_MARK, set lastindex
  only if i > lastmark.

* Lib/test/re_tests.py
* Lib/test/test_sre.py
  Included regression tests for the fixed bugs.

* Misc/NEWS
  Mention fixes.
2002-11-06 14:06:53 +00:00
Michael W. Hudson b6a4505123 Cray fixup as seen in bug #558153. 2002-07-31 09:54:24 +00:00
Mark Hammond 8235ea1c3a Land Patch [ 566100 ] Rationalize DL_IMPORT and DL_EXPORT. 2002-07-19 06:55:41 +00:00
Jeremy Hylton 938ace69a0 staticforward bites the dust.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure.  Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers.  (In
fact, we expect that the compilers are all fixed eight years later.)

I'm leaving staticforward and statichere defined in object.h as
static.  This is only for backwards compatibility with C extensions
that might still use it.

XXX I haven't updated the documentation.
2002-07-17 16:30:39 +00:00
Neal Norwitz 35fc7606f0 SF #561244 Micro optimizations
Convert loops to memset()s.
2002-06-13 21:11:11 +00:00
Neal Norwitz bb2769f580 Revert use of METH_OLDARGS (use 0) to support 1.5.2 2002-03-31 15:46:00 +00:00
Neal Norwitz b049325e92 Use symbolic METH_VARARGS/METH_OLDARGS instead of 1/0 for ml_flags 2002-03-31 14:44:22 +00:00
Fredrik Lundh 82b230732f bug #133283, #477728, #483789, #490573
backed out of broken minimal repeat patch from July

also fixed a couple of minor potential resource leaks in pattern_subx
(Guido had already fixed the big one)
2001-12-09 16:13:15 +00:00
Guido van Rossum 146483964e Patch supplied by Burton Radons for his own SF bug #487390: Modifying
type.__module__ behavior.

This adds the module name and a dot in front of the type name in every
type object initializer, except for built-in types (and those that
already had this).  Note that it touches lots of Mac modules -- I have
no way to test these but the changes look right.  Apologies if they're
not.  This also touches the weakref docs, which contains a sample type
object initializer.  It also touches the mmap test output, because the
mmap type's repr is included in that output.  It touches object.h to
put the correct description in a comment.
2001-12-08 18:02:58 +00:00
Guido van Rossum 4e173846c8 Fix for #489672 (Neil Norwitz): memory leak in test_sre.
(At least for the repeatable test case that Tim produced.)

pattern_subx(): Add missing DECREF(filter) in both exit branches
(normal and error return).  Also fix a DECREF(args) that should
certainly be a DECREF(match) -- because it's inside if (!args) and
right after allocation of match.
2001-12-07 04:25:10 +00:00
Fredrik Lundh 703ce8122c (experimental) "finditer" method/function. this works pretty much
like findall, but returns an iterator (which returns match objects)
instead of a list of strings/tuples.
2001-10-24 22:16:30 +00:00
Fredrik Lundh 6de22ef677 another major speedup: let sre.sub/subn check for escapes in the
template string, and don't call the template compiler if we can
avoid it.
2001-10-22 21:18:08 +00:00
Fredrik Lundh f864aa8fd9 sre.split should return the last segment, even if empty
(sorry, barry)
2001-10-22 06:01:56 +00:00
Fredrik Lundh dac58492aa fixed character set description in docstring (SRE uses Python
strings, not C strings)

removed USE_PYTHON defines, and related sre.py helpers

skip calling the subx helper if the template is callable.
interestingly enough, this means that

	def callback(m):
	    return literal
	result = pattern.sub(callback, string)

is much faster than

	result = pattern.sub(literal, string)
2001-10-21 21:48:30 +00:00
Fredrik Lundh 1296a8d77e sre.Scanner fixes (from Greg Chapman). also added a Scanner sanity
check to the test suite.

added a few missing exception checks in the _sre module
2001-10-21 18:04:11 +00:00
Fredrik Lundh bec95b9d88 rewrote the pattern.sub and pattern.subn methods in C
removed (conceptually flawed) getliteral helper; the new sub/subn code
uses a faster code path for literal replacement strings, but doesn't
(yet) look for literal patterns.

added STATE_OFFSET macro, and use it to convert state.start/ptr to
char indexes
2001-10-21 16:47:57 +00:00
Fredrik Lundh 971e78b55b rewrote the pattern.split method in C
also restored SRE Unicode support for 1.6/2.0/2.1
2001-10-20 17:48:46 +00:00
Fredrik Lundh 397a654791 SRE bug #441409:
compile should raise error for non-strings
SRE bug #432570, 448951:
    reset group after failed match

also bumped version number to 2.2.0
2001-10-18 19:30:16 +00:00
Fredrik Lundh 59b68656f8 fixed #449964: sre.sub raises an exception if the template contains a
\g<x> group reference followed by a character escape

(also restructured a few things on the way to fixing #449000)
2001-09-18 20:55:24 +00:00
Fredrik Lundh 21009b9c6f an SRE bugfix a day keeps Guido away...
#462270: sub-tle difference between pre.sub and sre.sub.  PRE ignored
an empty match at the previous location, SRE didn't.

also synced with Secret Labs "sreopen" codebase.
2001-09-18 18:47:09 +00:00
Sjoerd Mullender 89dfe9e292 Removed unreachable return to silence SGI compiler. 2001-08-30 14:37:07 +00:00
Martin v. Löwis 339d0f720e Patch #445762: Support --disable-unicode
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
2001-08-17 18:39:25 +00:00
Barry Warsaw 214a0b1382 init_sre(): Plug a little leak reported by Insure. 2001-08-16 20:33:48 +00:00
Fredrik Lundh 2d96f11d07 map re.sub() to string.replace(), when possible 2001-07-08 13:26:57 +00:00
Fredrik Lundh d89a2e7731 bug #416670
added copy/deepcopy support to SRE (still not enabled, since it's not
covered by the test suite)
2001-07-03 20:32:36 +00:00
Fredrik Lundh df781e6a3f reapplied darryl gallion's minimizing repeat fix. I'm still not 100%
sure about this one, but test #133283 now works even with the fix in
place, and so does the test suite.  we'll see what comes up...
2001-07-02 19:54:28 +00:00
Fredrik Lundh f71ae461bf pythonware repository roundtrip (untabification) 2001-07-02 17:04:48 +00:00
Fredrik Lundh 19af43d78a added martin's BIGCHARSET patch to SRE 2.1.1. martin reports 2x
speedups for certain unicode character ranges.
2001-07-02 16:58:38 +00:00
Fredrik Lundh b0f05bdfd3 merged with pythonware's SRE 2.1.1 codebase 2001-07-02 16:42:49 +00:00
Fredrik Lundh 9c7eab82b3 SRE: made "copyright" string static, to avoid potential linking
conflicts.
2001-04-15 19:00:58 +00:00
Fredrik Lundh b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Tim Peters 5687ffe0c5 SF patch 404928: Support for next Cygwin gcc (2.95.2-8) 2001-02-28 16:44:18 +00:00
Fredrik Lundh 1c5aa6901f bumped SRE version number to 2.1. cleaned up and added 1.5.2
compatibility patches.
2001-01-16 07:37:30 +00:00
Fredrik Lundh 6f5cba68fc fixed a memory leak in pattern cleanup (patch #103248 by cgw) 2001-01-16 07:05:29 +00:00
Fredrik Lundh b35ffc0417 added "magic" number to the _sre module, to avoid weird errors caused
by compiler/engine mismatches
2001-01-15 12:46:09 +00:00
Fredrik Lundh fa25a7d51f -- don't use recursion for unbounded non-greedy repeat
(bugs #115903, #115696)

This is based on a patch by Darrel Gallion.  I'm not 100%
sure about this fix, but I haven't managed to come up with
any test case it cannot handle...
2001-01-14 23:55:55 +00:00
Fredrik Lundh 770617b23e SRE fixes for 2.1 alpha:
-- added some more docstrings
-- fixed typo in scanner class (#125531)
-- the multiline flag (?m) should't affect the \Z operator (#127259)
-- fixed non-greedy backtracking bug (#123769, #127259)
-- added sre.DEBUG flag (currently dumps the parsed pattern structure)
-- fixed a couple of glitches in groupdict (the #126587 memory leak
   had already been fixed by AMK)
2001-01-14 15:06:11 +00:00
Andrew M. Kuchling 48f224c877 Fix bug 126587: matchobject.groupdict() leaks memory because of a missing
DECREF
2000-12-22 14:39:10 +00:00
Fredrik Lundh ebc37b28fa -- properly reset groups in findall (bug #117612)
-- fixed negative lookbehind to work correctly at the beginning
of the target string (bug #117242)

-- improved syntax check; you can no longer refer to a group
inside itself (bug #110866)
2000-10-28 19:30:41 +00:00
Fredrik Lundh 562586eb3a Accept keyword arguments for (most) pattern and match object
methods.  Closes buglet #115845.
2000-10-03 20:43:34 +00:00
Fredrik Lundh 65d4bc616a Fixed negative lookahead/lookbehind. Closes bug #115618. 2000-10-03 16:29:23 +00:00
Fred Drake d5fadf75e4 Rationalize use of limits.h, moving the inclusion to Python.h.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.

This closes SourceForge patch #101659 and bug #115323.
2000-09-26 05:46:01 +00:00
Fredrik Lundh 5644b7fad1 - fixed yet another gcc -pedantic warning
- added experimental "expand" method to match objects
- don't use the buffer interface on unicode strings
2000-09-21 17:03:25 +00:00
Fredrik Lundh 510c97ba2f return -1 for undefined groups (as implemented in 1.5.2) instead of
None (as documented) from start/end/span.  closes bug #113254
2000-09-02 16:36:57 +00:00
Fredrik Lundh e67d8e514f oops. accidentally reintroduced a memory leak. put the bugfix back. 2000-08-27 21:32:46 +00:00
Fredrik Lundh 33accc1f5c don't mistake memory errors (including reaching the recursion limit)
with success.  also, check return values from the mark functions.

this addresses (but doesn't really solve) bug #112693, and low-memory
problems reported by jack jansen.
2000-08-27 20:59:47 +00:00
Barry Warsaw 152fbe88e9 pattern_findall(): Plug small memory leak discovered by Insure.
PyList_Append() always incref's the inserted item.  Be sure to decref
it regardless of whether the append succeeds or fails.
2000-08-18 05:09:50 +00:00
Trent Mick 239548f37d The sre test suite currently overruns the stack on Win64, Linux64, and Monterey
(64-bit AIX) This is because the RECURSION_LIMIT is too low. This patch lowers
to recusion limit to 7500 such that the recusion check fires before a segfault.

Fredrik suggested/approved the fix in private email, modulo sre's recusion
limit checking no being necessary when PyOS_CheckStack is implemented for
Windows.
2000-08-16 22:29:55 +00:00
Fredrik Lundh 5810064476 -- changed findall to return empty strings instead of None
for undefined groups
2000-08-09 09:14:35 +00:00
Jack Jansen 0d15908629 Added a missing } in the USE_STACKCHECK code. 2000-08-07 21:02:50 +00:00
Fredrik Lundh 7898c3e685 -- reset marks if repeat_one tail doesn't match
(this should fix Sjoerd's xmllib problem)
-- added skip field to INFO header
-- changed compiler to generate charset INFO header
-- changed trace messages to support post-mortem analysis
2000-08-07 20:59:04 +00:00
Fredrik Lundh 18c2aa25a1 + if USE_STACKCHECK is defined, use PyOS_CheckStack to look
for excessive recursion.
2000-08-07 17:33:38 +00:00
Fredrik Lundh 96ab46529b -- added recursion limit (currently ~10,000 levels)
-- improved error messages
-- factored out SRE_COUNT; the same code is used by
   SRE_OP_REPEAT_ONE_TEMPLATE
-- minor cleanups
2000-08-03 16:29:50 +00:00
Fredrik Lundh e186983842 final 0.9.8 updates:
-- added REPEAT_ONE operator
-- added ANY_ALL operator (used to represent "(?s).")
2000-08-01 22:47:49 +00:00
Fredrik Lundh 2f2c67d7e5 -- fixed width calculations for alternations
-- fixed literal check in branch operator
   (this broke test_tokenize, as reported by Mark Favas)
-- added REPEAT_ONE operator (still not enabled, though)
-- added some debugging stuff (maxlevel)
2000-08-01 21:05:41 +00:00
Fredrik Lundh 29c4ba9ada SRE 0.9.8: passes the entire test suite
-- reverted REPEAT operator to use "repeat context" strategy
   (from 0.8.X), but done right this time.
-- got rid of backtracking stack; use nested SRE_MATCH calls
   instead (should probably put it back again in 0.9.9 ;-)
-- properly reset state in scanner mode
-- don't use aggressive inlining by default
2000-08-01 18:20:07 +00:00
Fredrik Lundh 8a3ebf8ca8 -- SRE 0.9.6 sync. this includes:
+ added "regs" attribute
 + fixed "pos" and "endpos" attributes
 + reset "lastindex" and "lastgroup" in scanner methods
 + removed (?P#id) syntax; the "lastindex" and "lastgroup"
   attributes are now always set
 + removed string module dependencies in sre_parse
 + better debugging support in sre_parse
 + various tweaks to build under 1.5.2
2000-07-23 21:46:17 +00:00
Thomas Wouters f3f33dcf03 Bunch of minor ANSIfications: 'void initfunc()' -> 'void initfunc(void)',
and a couple of functions that were missed in the previous batches. Not
terribly tested, but very carefully scrutinized, three times.

All these were found by the little findkrc.py that I posted to python-dev,
which means there might be more lurking. Cases such as this:

long
func(a, b)
	long a;
	long b; /* flagword */
{

and other cases where the last ; in the argument list isn't followed by a
newline and an opening curly bracket. Regexps to catch all are welcome, of
course ;)
2000-07-21 06:00:07 +00:00
Jeremy Hylton 03657cfdb0 replace PyXXX_Length calls with PyXXX_Size calls 2000-07-12 13:05:33 +00:00
Fredrik Lundh 2855290b84 maintenance release:
- reorganized some code to get rid of -Wall and -W4
  warnings

- fixed default argument handling for sub/subn/split
  methods (reported by Peter Schneider-Kamp).
2000-07-05 21:14:16 +00:00
Fredrik Lundh 72b82ba16d - fixed grouping error bug
- changed "group" operator to "groupref"
2000-07-03 21:31:48 +00:00
Fredrik Lundh 6f01398236 - added lookbehind support (?<=pattern), (?<!pattern).
the pattern must have a fixed width.

- got rid of array-module dependencies; the match pro-
  gram is now stored inside the pattern object, rather
  than in an extra string buffer.

- cleaned up a various of potential leaks, api abuses,
  and other minors in the engine module.

- use mal's new isalnum macro, rather than my own work-
  around.

- untabified test_sre.py.  seems like I removed a couple
  of trailing spaces in the process...
2000-07-03 18:44:21 +00:00
Fredrik Lundh c2301730b8 - experimental: added two new attributes to the match object:
"lastgroup" is the name of the last matched capturing group,
  "lastindex" is the index of the same group.  if no group was
  matched, both attributes are set to None.

  the (?P#) feature will be removed in the next relase.
2000-07-02 22:25:39 +00:00
Fredrik Lundh 7cafe4d7e4 - actually enabled charset anchors in the engine (still not
used by the code generator)

- changed max repeat value in engine (to match earlier array fix)

- added experimental "which part matched?" mechanism to sre; see
  http://hem.passagen.se/eff/2000_07_01_bot-archive.htm#416954
  or python-dev for details.
2000-07-02 17:33:27 +00:00
Fredrik Lundh 3562f11764 -- use charset bitmaps where appropriate. this gives a 5-10%
speedup for some tests, including the python tokenizer.

-- added support for an optional charset anchor to the engine
   (currently unused by the code generator).

-- removed workaround for array module bug.
2000-07-02 12:00:07 +00:00
Fredrik Lundh c13222cdff - fixed "{ in any other context" bug
- minor comment touchups in the C module
2000-07-01 23:49:14 +00:00
Fredrik Lundh 22d2546520 today's SRE update:
-- changed 1.6 to 2.0 in the file headers

-- fixed ISALNUM macro for the unicode locale.  this
   solution isn't perfect, but the best I can do with
   Python's current unicode database.
2000-07-01 17:50:59 +00:00
Fredrik Lundh ef34bd2c0d -- changed $ to match before a trailing newline, even
if the multiline flag isn't given.
2000-06-30 21:40:20 +00:00
Fredrik Lundh 0640e1161f the mad patcher strikes again:
-- added pickling support (only works if sre is imported)

-- fixed wordsize problems in engine
   (instead of casting literals down to the character size,
   cast characters up to the literal size (same as the code
   word size).  this prevents false hits when you're matching
   a unicode pattern against an 8-bit string. (unfortunately,
   this broke another test, but I think the test should be
   changed in this case; more on that on python-dev)

-- added sre.purge function
   (unofficial, clears the cache)
2000-06-30 13:55:15 +00:00
Fredrik Lundh 43b3b49b5a - fixed lookahead assertions (#10, #11, #12)
- untabified sre_constants.py
2000-06-30 10:41:31 +00:00
Fredrik Lundh df02d0b3f0 - fixed default value handling in group/groupdict
- added test suite
2000-06-30 07:08:20 +00:00
Fredrik Lundh 01016fe972 - fixed split behaviour on empty matches
- fixed compiler problems when using locale/unicode flags

- fixed group/octal code parsing in sub/subn templates
2000-06-30 00:27:46 +00:00
Fredrik Lundh 29c08beab0 still trying to figure out how to fix the remaining
group reset problem.  in the meantime, I added some
optimizations:

- added "inline" directive to LOCAL

  (this assumes that AC_C_INLINE does what it's
  supposed to do).  to compile SRE on a non-unix
  platform that doesn't support inline, you have
  to add a "#define inline" somewhere...

- added code to generate a SRE_OP_INFO primitive

- added code to do fast prefix search

  (enabled by the USE_FAST_SEARCH define; default
  is on, in this release)
2000-06-29 23:33:12 +00:00
Fredrik Lundh 8094611eb8 - fixed another split problem
(those semantics are weird...)

- got rid of $Id$'s (for the moment, at least).  in other
  words, there should be no more "empty" checkins.

- internal: some minor cleanups.
2000-06-29 18:03:25 +00:00
Fredrik Lundh be2211e940 - fixed split
(test_sre still complains about split, but that's caused by
  the group reset bug, not split itself)

- added more mark slots
  (should be dynamically allocated, but 100 is better than 32.
  and checking for the upper limit is better than overwriting
  the memory ;-)

- internal: renamed the cursor helper class

- internal: removed some bloat from sre_compile
2000-06-29 16:57:40 +00:00
Fredrik Lundh b389df3402 - renamed "tolower" hook (it happened to work with
my compiler, but not on guido's box...)
2000-06-29 12:48:37 +00:00
Fredrik Lundh 75f2d675ed - last patch broke parse_template; fixed by changing some
tests in sre_patch back to previous version

- fixed return value from findall

- renamed a bunch of functions inside _sre (way too
  many leading underscores...)

</F>
2000-06-29 11:34:28 +00:00
Fredrik Lundh 6c68dc7b1a - removed "alpha only" licensing restriction
- removed some hacks that worked around 1.6 alpha bugs
- removed bogus test code from sre_parse
2000-06-29 10:34:56 +00:00
Fredrik Lundh 436c3d58a2 towards 1.6b1 2000-06-29 08:58:44 +00:00
Jeremy Hylton b1aa19515f Fredrik Lundh: here's the 96.6% version of SRE 2000-06-01 17:39:12 +00:00
Guido van Rossum b18618dab7 Vladimir Marangozov's long-awaited malloc restructuring.
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.

(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode.  I'm also holding back on his
change to main.c, which seems unnecessary to me.)
2000-05-03 23:44:39 +00:00
Guido van Rossum 29530886af Remove CRLF line endings.
Fredrik Lundh: add two missing casts.
2000-04-10 17:06:55 +00:00
Guido van Rossum b700df9824 Adding Fredrik Lundh's _sre.c module and its header files.
NOTE: THIS IS VERY ROUGH ALPHA CODE!
2000-03-31 14:59:30 +00:00