Commit Graph

65 Commits

Author SHA1 Message Date
Fredrik Lundh 59b68656f8 fixed #449964: sre.sub raises an exception if the template contains a
\g<x> group reference followed by a character escape

(also restructured a few things on the way to fixing #449000)
2001-09-18 20:55:24 +00:00
Fredrik Lundh 21009b9c6f an SRE bugfix a day keeps Guido away...
#462270: sub-tle difference between pre.sub and sre.sub.  PRE ignored
an empty match at the previous location, SRE didn't.

also synced with Secret Labs "sreopen" codebase.
2001-09-18 18:47:09 +00:00
Sjoerd Mullender 89dfe9e292 Removed unreachable return to silence SGI compiler. 2001-08-30 14:37:07 +00:00
Martin v. Löwis 339d0f720e Patch #445762: Support --disable-unicode
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
2001-08-17 18:39:25 +00:00
Barry Warsaw 214a0b1382 init_sre(): Plug a little leak reported by Insure. 2001-08-16 20:33:48 +00:00
Fredrik Lundh 2d96f11d07 map re.sub() to string.replace(), when possible 2001-07-08 13:26:57 +00:00
Fredrik Lundh d89a2e7731 bug #416670
added copy/deepcopy support to SRE (still not enabled, since it's not
covered by the test suite)
2001-07-03 20:32:36 +00:00
Fredrik Lundh df781e6a3f reapplied darryl gallion's minimizing repeat fix. I'm still not 100%
sure about this one, but test #133283 now works even with the fix in
place, and so does the test suite.  we'll see what comes up...
2001-07-02 19:54:28 +00:00
Fredrik Lundh f71ae461bf pythonware repository roundtrip (untabification) 2001-07-02 17:04:48 +00:00
Fredrik Lundh 19af43d78a added martin's BIGCHARSET patch to SRE 2.1.1. martin reports 2x
speedups for certain unicode character ranges.
2001-07-02 16:58:38 +00:00
Fredrik Lundh b0f05bdfd3 merged with pythonware's SRE 2.1.1 codebase 2001-07-02 16:42:49 +00:00
Fredrik Lundh 9c7eab82b3 SRE: made "copyright" string static, to avoid potential linking
conflicts.
2001-04-15 19:00:58 +00:00
Fredrik Lundh b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Tim Peters 5687ffe0c5 SF patch 404928: Support for next Cygwin gcc (2.95.2-8) 2001-02-28 16:44:18 +00:00
Fredrik Lundh 1c5aa6901f bumped SRE version number to 2.1. cleaned up and added 1.5.2
compatibility patches.
2001-01-16 07:37:30 +00:00
Fredrik Lundh 6f5cba68fc fixed a memory leak in pattern cleanup (patch #103248 by cgw) 2001-01-16 07:05:29 +00:00
Fredrik Lundh b35ffc0417 added "magic" number to the _sre module, to avoid weird errors caused
by compiler/engine mismatches
2001-01-15 12:46:09 +00:00
Fredrik Lundh fa25a7d51f -- don't use recursion for unbounded non-greedy repeat
(bugs #115903, #115696)

This is based on a patch by Darrel Gallion.  I'm not 100%
sure about this fix, but I haven't managed to come up with
any test case it cannot handle...
2001-01-14 23:55:55 +00:00
Fredrik Lundh 770617b23e SRE fixes for 2.1 alpha:
-- added some more docstrings
-- fixed typo in scanner class (#125531)
-- the multiline flag (?m) should't affect the \Z operator (#127259)
-- fixed non-greedy backtracking bug (#123769, #127259)
-- added sre.DEBUG flag (currently dumps the parsed pattern structure)
-- fixed a couple of glitches in groupdict (the #126587 memory leak
   had already been fixed by AMK)
2001-01-14 15:06:11 +00:00
Andrew M. Kuchling 48f224c877 Fix bug 126587: matchobject.groupdict() leaks memory because of a missing
DECREF
2000-12-22 14:39:10 +00:00
Fredrik Lundh ebc37b28fa -- properly reset groups in findall (bug #117612)
-- fixed negative lookbehind to work correctly at the beginning
of the target string (bug #117242)

-- improved syntax check; you can no longer refer to a group
inside itself (bug #110866)
2000-10-28 19:30:41 +00:00
Fredrik Lundh 562586eb3a Accept keyword arguments for (most) pattern and match object
methods.  Closes buglet #115845.
2000-10-03 20:43:34 +00:00
Fredrik Lundh 65d4bc616a Fixed negative lookahead/lookbehind. Closes bug #115618. 2000-10-03 16:29:23 +00:00
Fred Drake d5fadf75e4 Rationalize use of limits.h, moving the inclusion to Python.h.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.

This closes SourceForge patch #101659 and bug #115323.
2000-09-26 05:46:01 +00:00
Fredrik Lundh 5644b7fad1 - fixed yet another gcc -pedantic warning
- added experimental "expand" method to match objects
- don't use the buffer interface on unicode strings
2000-09-21 17:03:25 +00:00
Fredrik Lundh 510c97ba2f return -1 for undefined groups (as implemented in 1.5.2) instead of
None (as documented) from start/end/span.  closes bug #113254
2000-09-02 16:36:57 +00:00
Fredrik Lundh e67d8e514f oops. accidentally reintroduced a memory leak. put the bugfix back. 2000-08-27 21:32:46 +00:00
Fredrik Lundh 33accc1f5c don't mistake memory errors (including reaching the recursion limit)
with success.  also, check return values from the mark functions.

this addresses (but doesn't really solve) bug #112693, and low-memory
problems reported by jack jansen.
2000-08-27 20:59:47 +00:00
Barry Warsaw 152fbe88e9 pattern_findall(): Plug small memory leak discovered by Insure.
PyList_Append() always incref's the inserted item.  Be sure to decref
it regardless of whether the append succeeds or fails.
2000-08-18 05:09:50 +00:00
Trent Mick 239548f37d The sre test suite currently overruns the stack on Win64, Linux64, and Monterey
(64-bit AIX) This is because the RECURSION_LIMIT is too low. This patch lowers
to recusion limit to 7500 such that the recusion check fires before a segfault.

Fredrik suggested/approved the fix in private email, modulo sre's recusion
limit checking no being necessary when PyOS_CheckStack is implemented for
Windows.
2000-08-16 22:29:55 +00:00
Fredrik Lundh 5810064476 -- changed findall to return empty strings instead of None
for undefined groups
2000-08-09 09:14:35 +00:00
Jack Jansen 0d15908629 Added a missing } in the USE_STACKCHECK code. 2000-08-07 21:02:50 +00:00
Fredrik Lundh 7898c3e685 -- reset marks if repeat_one tail doesn't match
(this should fix Sjoerd's xmllib problem)
-- added skip field to INFO header
-- changed compiler to generate charset INFO header
-- changed trace messages to support post-mortem analysis
2000-08-07 20:59:04 +00:00
Fredrik Lundh 18c2aa25a1 + if USE_STACKCHECK is defined, use PyOS_CheckStack to look
for excessive recursion.
2000-08-07 17:33:38 +00:00
Fredrik Lundh 96ab46529b -- added recursion limit (currently ~10,000 levels)
-- improved error messages
-- factored out SRE_COUNT; the same code is used by
   SRE_OP_REPEAT_ONE_TEMPLATE
-- minor cleanups
2000-08-03 16:29:50 +00:00
Fredrik Lundh e186983842 final 0.9.8 updates:
-- added REPEAT_ONE operator
-- added ANY_ALL operator (used to represent "(?s).")
2000-08-01 22:47:49 +00:00
Fredrik Lundh 2f2c67d7e5 -- fixed width calculations for alternations
-- fixed literal check in branch operator
   (this broke test_tokenize, as reported by Mark Favas)
-- added REPEAT_ONE operator (still not enabled, though)
-- added some debugging stuff (maxlevel)
2000-08-01 21:05:41 +00:00
Fredrik Lundh 29c4ba9ada SRE 0.9.8: passes the entire test suite
-- reverted REPEAT operator to use "repeat context" strategy
   (from 0.8.X), but done right this time.
-- got rid of backtracking stack; use nested SRE_MATCH calls
   instead (should probably put it back again in 0.9.9 ;-)
-- properly reset state in scanner mode
-- don't use aggressive inlining by default
2000-08-01 18:20:07 +00:00
Fredrik Lundh 8a3ebf8ca8 -- SRE 0.9.6 sync. this includes:
+ added "regs" attribute
 + fixed "pos" and "endpos" attributes
 + reset "lastindex" and "lastgroup" in scanner methods
 + removed (?P#id) syntax; the "lastindex" and "lastgroup"
   attributes are now always set
 + removed string module dependencies in sre_parse
 + better debugging support in sre_parse
 + various tweaks to build under 1.5.2
2000-07-23 21:46:17 +00:00
Thomas Wouters f3f33dcf03 Bunch of minor ANSIfications: 'void initfunc()' -> 'void initfunc(void)',
and a couple of functions that were missed in the previous batches. Not
terribly tested, but very carefully scrutinized, three times.

All these were found by the little findkrc.py that I posted to python-dev,
which means there might be more lurking. Cases such as this:

long
func(a, b)
	long a;
	long b; /* flagword */
{

and other cases where the last ; in the argument list isn't followed by a
newline and an opening curly bracket. Regexps to catch all are welcome, of
course ;)
2000-07-21 06:00:07 +00:00
Jeremy Hylton 03657cfdb0 replace PyXXX_Length calls with PyXXX_Size calls 2000-07-12 13:05:33 +00:00
Fredrik Lundh 2855290b84 maintenance release:
- reorganized some code to get rid of -Wall and -W4
  warnings

- fixed default argument handling for sub/subn/split
  methods (reported by Peter Schneider-Kamp).
2000-07-05 21:14:16 +00:00
Fredrik Lundh 72b82ba16d - fixed grouping error bug
- changed "group" operator to "groupref"
2000-07-03 21:31:48 +00:00
Fredrik Lundh 6f01398236 - added lookbehind support (?<=pattern), (?<!pattern).
the pattern must have a fixed width.

- got rid of array-module dependencies; the match pro-
  gram is now stored inside the pattern object, rather
  than in an extra string buffer.

- cleaned up a various of potential leaks, api abuses,
  and other minors in the engine module.

- use mal's new isalnum macro, rather than my own work-
  around.

- untabified test_sre.py.  seems like I removed a couple
  of trailing spaces in the process...
2000-07-03 18:44:21 +00:00
Fredrik Lundh c2301730b8 - experimental: added two new attributes to the match object:
"lastgroup" is the name of the last matched capturing group,
  "lastindex" is the index of the same group.  if no group was
  matched, both attributes are set to None.

  the (?P#) feature will be removed in the next relase.
2000-07-02 22:25:39 +00:00
Fredrik Lundh 7cafe4d7e4 - actually enabled charset anchors in the engine (still not
used by the code generator)

- changed max repeat value in engine (to match earlier array fix)

- added experimental "which part matched?" mechanism to sre; see
  http://hem.passagen.se/eff/2000_07_01_bot-archive.htm#416954
  or python-dev for details.
2000-07-02 17:33:27 +00:00
Fredrik Lundh 3562f11764 -- use charset bitmaps where appropriate. this gives a 5-10%
speedup for some tests, including the python tokenizer.

-- added support for an optional charset anchor to the engine
   (currently unused by the code generator).

-- removed workaround for array module bug.
2000-07-02 12:00:07 +00:00
Fredrik Lundh c13222cdff - fixed "{ in any other context" bug
- minor comment touchups in the C module
2000-07-01 23:49:14 +00:00
Fredrik Lundh 22d2546520 today's SRE update:
-- changed 1.6 to 2.0 in the file headers

-- fixed ISALNUM macro for the unicode locale.  this
   solution isn't perfect, but the best I can do with
   Python's current unicode database.
2000-07-01 17:50:59 +00:00
Fredrik Lundh ef34bd2c0d -- changed $ to match before a trailing newline, even
if the multiline flag isn't given.
2000-06-30 21:40:20 +00:00