Commit Graph

241 Commits

Author SHA1 Message Date
Fredrik Lundh 5644b7fad1 - fixed yet another gcc -pedantic warning
- added experimental "expand" method to match objects
- don't use the buffer interface on unicode strings
2000-09-21 17:03:25 +00:00
Fredrik Lundh 510c97ba2f return -1 for undefined groups (as implemented in 1.5.2) instead of
None (as documented) from start/end/span.  closes bug #113254
2000-09-02 16:36:57 +00:00
Fredrik Lundh e67d8e514f oops. accidentally reintroduced a memory leak. put the bugfix back. 2000-08-27 21:32:46 +00:00
Fredrik Lundh 33accc1f5c don't mistake memory errors (including reaching the recursion limit)
with success.  also, check return values from the mark functions.

this addresses (but doesn't really solve) bug #112693, and low-memory
problems reported by jack jansen.
2000-08-27 20:59:47 +00:00
Barry Warsaw 152fbe88e9 pattern_findall(): Plug small memory leak discovered by Insure.
PyList_Append() always incref's the inserted item.  Be sure to decref
it regardless of whether the append succeeds or fails.
2000-08-18 05:09:50 +00:00
Trent Mick 239548f37d The sre test suite currently overruns the stack on Win64, Linux64, and Monterey
(64-bit AIX) This is because the RECURSION_LIMIT is too low. This patch lowers
to recusion limit to 7500 such that the recusion check fires before a segfault.

Fredrik suggested/approved the fix in private email, modulo sre's recusion
limit checking no being necessary when PyOS_CheckStack is implemented for
Windows.
2000-08-16 22:29:55 +00:00
Fredrik Lundh 5810064476 -- changed findall to return empty strings instead of None
for undefined groups
2000-08-09 09:14:35 +00:00
Jack Jansen 0d15908629 Added a missing } in the USE_STACKCHECK code. 2000-08-07 21:02:50 +00:00
Fredrik Lundh 7898c3e685 -- reset marks if repeat_one tail doesn't match
(this should fix Sjoerd's xmllib problem)
-- added skip field to INFO header
-- changed compiler to generate charset INFO header
-- changed trace messages to support post-mortem analysis
2000-08-07 20:59:04 +00:00
Fredrik Lundh 18c2aa25a1 + if USE_STACKCHECK is defined, use PyOS_CheckStack to look
for excessive recursion.
2000-08-07 17:33:38 +00:00
Fredrik Lundh 96ab46529b -- added recursion limit (currently ~10,000 levels)
-- improved error messages
-- factored out SRE_COUNT; the same code is used by
   SRE_OP_REPEAT_ONE_TEMPLATE
-- minor cleanups
2000-08-03 16:29:50 +00:00
Fredrik Lundh e186983842 final 0.9.8 updates:
-- added REPEAT_ONE operator
-- added ANY_ALL operator (used to represent "(?s).")
2000-08-01 22:47:49 +00:00
Fredrik Lundh 2f2c67d7e5 -- fixed width calculations for alternations
-- fixed literal check in branch operator
   (this broke test_tokenize, as reported by Mark Favas)
-- added REPEAT_ONE operator (still not enabled, though)
-- added some debugging stuff (maxlevel)
2000-08-01 21:05:41 +00:00
Fredrik Lundh 29c4ba9ada SRE 0.9.8: passes the entire test suite
-- reverted REPEAT operator to use "repeat context" strategy
   (from 0.8.X), but done right this time.
-- got rid of backtracking stack; use nested SRE_MATCH calls
   instead (should probably put it back again in 0.9.9 ;-)
-- properly reset state in scanner mode
-- don't use aggressive inlining by default
2000-08-01 18:20:07 +00:00
Fredrik Lundh 8a3ebf8ca8 -- SRE 0.9.6 sync. this includes:
+ added "regs" attribute
 + fixed "pos" and "endpos" attributes
 + reset "lastindex" and "lastgroup" in scanner methods
 + removed (?P#id) syntax; the "lastindex" and "lastgroup"
   attributes are now always set
 + removed string module dependencies in sre_parse
 + better debugging support in sre_parse
 + various tweaks to build under 1.5.2
2000-07-23 21:46:17 +00:00
Thomas Wouters f3f33dcf03 Bunch of minor ANSIfications: 'void initfunc()' -> 'void initfunc(void)',
and a couple of functions that were missed in the previous batches. Not
terribly tested, but very carefully scrutinized, three times.

All these were found by the little findkrc.py that I posted to python-dev,
which means there might be more lurking. Cases such as this:

long
func(a, b)
	long a;
	long b; /* flagword */
{

and other cases where the last ; in the argument list isn't followed by a
newline and an opening curly bracket. Regexps to catch all are welcome, of
course ;)
2000-07-21 06:00:07 +00:00
Jeremy Hylton 03657cfdb0 replace PyXXX_Length calls with PyXXX_Size calls 2000-07-12 13:05:33 +00:00
Fredrik Lundh 2855290b84 maintenance release:
- reorganized some code to get rid of -Wall and -W4
  warnings

- fixed default argument handling for sub/subn/split
  methods (reported by Peter Schneider-Kamp).
2000-07-05 21:14:16 +00:00
Fredrik Lundh 72b82ba16d - fixed grouping error bug
- changed "group" operator to "groupref"
2000-07-03 21:31:48 +00:00
Fredrik Lundh 6f01398236 - added lookbehind support (?<=pattern), (?<!pattern).
the pattern must have a fixed width.

- got rid of array-module dependencies; the match pro-
  gram is now stored inside the pattern object, rather
  than in an extra string buffer.

- cleaned up a various of potential leaks, api abuses,
  and other minors in the engine module.

- use mal's new isalnum macro, rather than my own work-
  around.

- untabified test_sre.py.  seems like I removed a couple
  of trailing spaces in the process...
2000-07-03 18:44:21 +00:00
Fredrik Lundh c2301730b8 - experimental: added two new attributes to the match object:
"lastgroup" is the name of the last matched capturing group,
  "lastindex" is the index of the same group.  if no group was
  matched, both attributes are set to None.

  the (?P#) feature will be removed in the next relase.
2000-07-02 22:25:39 +00:00
Fredrik Lundh 7cafe4d7e4 - actually enabled charset anchors in the engine (still not
used by the code generator)

- changed max repeat value in engine (to match earlier array fix)

- added experimental "which part matched?" mechanism to sre; see
  http://hem.passagen.se/eff/2000_07_01_bot-archive.htm#416954
  or python-dev for details.
2000-07-02 17:33:27 +00:00
Fredrik Lundh 3562f11764 -- use charset bitmaps where appropriate. this gives a 5-10%
speedup for some tests, including the python tokenizer.

-- added support for an optional charset anchor to the engine
   (currently unused by the code generator).

-- removed workaround for array module bug.
2000-07-02 12:00:07 +00:00
Fredrik Lundh c13222cdff - fixed "{ in any other context" bug
- minor comment touchups in the C module
2000-07-01 23:49:14 +00:00
Fredrik Lundh 22d2546520 today's SRE update:
-- changed 1.6 to 2.0 in the file headers

-- fixed ISALNUM macro for the unicode locale.  this
   solution isn't perfect, but the best I can do with
   Python's current unicode database.
2000-07-01 17:50:59 +00:00
Fredrik Lundh ef34bd2c0d -- changed $ to match before a trailing newline, even
if the multiline flag isn't given.
2000-06-30 21:40:20 +00:00
Fredrik Lundh 0640e1161f the mad patcher strikes again:
-- added pickling support (only works if sre is imported)

-- fixed wordsize problems in engine
   (instead of casting literals down to the character size,
   cast characters up to the literal size (same as the code
   word size).  this prevents false hits when you're matching
   a unicode pattern against an 8-bit string. (unfortunately,
   this broke another test, but I think the test should be
   changed in this case; more on that on python-dev)

-- added sre.purge function
   (unofficial, clears the cache)
2000-06-30 13:55:15 +00:00
Fredrik Lundh 43b3b49b5a - fixed lookahead assertions (#10, #11, #12)
- untabified sre_constants.py
2000-06-30 10:41:31 +00:00
Fredrik Lundh df02d0b3f0 - fixed default value handling in group/groupdict
- added test suite
2000-06-30 07:08:20 +00:00
Fredrik Lundh 01016fe972 - fixed split behaviour on empty matches
- fixed compiler problems when using locale/unicode flags

- fixed group/octal code parsing in sub/subn templates
2000-06-30 00:27:46 +00:00
Fredrik Lundh 29c08beab0 still trying to figure out how to fix the remaining
group reset problem.  in the meantime, I added some
optimizations:

- added "inline" directive to LOCAL

  (this assumes that AC_C_INLINE does what it's
  supposed to do).  to compile SRE on a non-unix
  platform that doesn't support inline, you have
  to add a "#define inline" somewhere...

- added code to generate a SRE_OP_INFO primitive

- added code to do fast prefix search

  (enabled by the USE_FAST_SEARCH define; default
  is on, in this release)
2000-06-29 23:33:12 +00:00
Fredrik Lundh 8094611eb8 - fixed another split problem
(those semantics are weird...)

- got rid of $Id$'s (for the moment, at least).  in other
  words, there should be no more "empty" checkins.

- internal: some minor cleanups.
2000-06-29 18:03:25 +00:00
Fredrik Lundh be2211e940 - fixed split
(test_sre still complains about split, but that's caused by
  the group reset bug, not split itself)

- added more mark slots
  (should be dynamically allocated, but 100 is better than 32.
  and checking for the upper limit is better than overwriting
  the memory ;-)

- internal: renamed the cursor helper class

- internal: removed some bloat from sre_compile
2000-06-29 16:57:40 +00:00
Fredrik Lundh b389df3402 - renamed "tolower" hook (it happened to work with
my compiler, but not on guido's box...)
2000-06-29 12:48:37 +00:00
Fredrik Lundh 75f2d675ed - last patch broke parse_template; fixed by changing some
tests in sre_patch back to previous version

- fixed return value from findall

- renamed a bunch of functions inside _sre (way too
  many leading underscores...)

</F>
2000-06-29 11:34:28 +00:00
Fredrik Lundh 6c68dc7b1a - removed "alpha only" licensing restriction
- removed some hacks that worked around 1.6 alpha bugs
- removed bogus test code from sre_parse
2000-06-29 10:34:56 +00:00
Fredrik Lundh 436c3d58a2 towards 1.6b1 2000-06-29 08:58:44 +00:00
Jeremy Hylton b1aa19515f Fredrik Lundh: here's the 96.6% version of SRE 2000-06-01 17:39:12 +00:00
Guido van Rossum b18618dab7 Vladimir Marangozov's long-awaited malloc restructuring.
For more comments, read the patches@python.org archives.
For documentation read the comments in mymalloc.h and objimpl.h.

(This is not exactly what Vladimir posted to the patches list; I've
made a few changes, and Vladimir sent me a fix in private email for a
problem that only occurs in debug mode.  I'm also holding back on his
change to main.c, which seems unnecessary to me.)
2000-05-03 23:44:39 +00:00
Guido van Rossum 29530886af Remove CRLF line endings.
Fredrik Lundh: add two missing casts.
2000-04-10 17:06:55 +00:00
Guido van Rossum b700df9824 Adding Fredrik Lundh's _sre.c module and its header files.
NOTE: THIS IS VERY ROUGH ALPHA CODE!
2000-03-31 14:59:30 +00:00