cpython

Commit Graph

Author	SHA1	Message	Date
Gustavo Niemeyer	3c9068bbec	Made MAX_UNTIL/MIN_UNTIL code more coherent about mark protection, accordingly to further discussions with Greg Chapman in patch #712900.	2003-04-22 15:39:09 +00:00
Gustavo Niemeyer	be733ee7fb	More work on bug #672491 and patch #712900 . I've applied a modified version of Greg Chapman's patch. I've included the fixes without introducing the reorganization mentioned, for the sake of stability. Also, the second fix mentioned in the patch don't fix the mentioned problem anymore, because of the change introduced by patch #720991 (by Greg as well). The new fix wasn't complicated though, and is included as well. As a note. It seems that there are other places that require the "protection" of LASTMARK_SAVE()/LASTMARK_RESTORE(), and are just waiting for someone to find how to break them. Particularly, I belive that every recursion of SRE_MATCH() should be protected by these macros. I won't do that right now since I'm not completely sure about this, and we don't have much time for testing until the next release.	2003-04-20 07:35:44 +00:00
Gustavo Niemeyer	1aca359e89	- Fixed bug #672491 . This change restores the behavior of lastindex/lastgroup to be compliant with previous python versions, by backing out the changes made in revision 2.84 which affected this. The bugfix for backtracking is still maintained.	2003-04-20 00:45:13 +00:00
Martin v. Löwis	78e2f06cc6	Fully support 32-bit codes. Enable BIGCHARSET in UCS-4 builds.	2003-04-19 12:56:08 +00:00
Guido van Rossum	41c99e7f96	SF patch #720991 by Gary Herron: A small fix for bug #545855 and Greg Chapman's addition of op code SRE_OP_MIN_REPEAT_ONE for eliminating recursion on simple uses of pattern '*?' on a long string.	2003-04-14 17:59:34 +00:00
Fredrik Lundh	09705f0b89	fix for SF #635398 (don't "downcast" return strings from unicode to ascii)	2002-11-22 12:46:35 +00:00
Neal Norwitz	addfe0c09c	Make private functions static so we don't pollute the namespace	2002-11-10 14:33:26 +00:00
Gustavo Niemeyer	c523b04b0f	Fixed sre bug "[#581080 ] Provoking infinite scanner loops". This bug happened because: 1) the scanner_search and scanner_match methods were not checking the buffer limits before increasing the current pointer; and 2) SRE_SEARCH was using "if (ptr == end)" as a loop break, instead of "if (ptr >= end)". * Modules/_sre.c (SRE_SEARCH): Check for "ptr >= end" to break loops, so that we don't hang forever if a pointer passing the buffer limit is used. (scanner_search,scanner_match): Don't increment the current pointer if we're going to pass the buffer limit. * Misc/NEWS Mention the fix.	2002-11-07 03:28:56 +00:00
Gustavo Niemeyer	4e7be06a65	Fixed bug #470582 , using a modified version of patch #527371 , from Greg Chapman. * Modules/_sre.c (lastmark_restore): New function, implementing algorithm to restore a state to a given lastmark. In addition to the similar algorithm used in a few places of SRE_MATCH, restore lastindex when restoring lastmark. (SRE_MATCH): Replace lastmark inline restoring by lastmark_restore(), function. Also include it where missing. In SRE_OP_MARK, set lastindex only if i > lastmark. * Lib/test/re_tests.py * Lib/test/test_sre.py Included regression tests for the fixed bugs. * Misc/NEWS Mention fixes.	2002-11-06 14:06:53 +00:00
Michael W. Hudson	b6a4505123	Cray fixup as seen in bug #558153 .	2002-07-31 09:54:24 +00:00
Mark Hammond	8235ea1c3a	Land Patch [ 566100 ] Rationalize DL_IMPORT and DL_EXPORT.	2002-07-19 06:55:41 +00:00
Jeremy Hylton	938ace69a0	staticforward bites the dust. The staticforward define was needed to support certain broken C compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the static keyword when it was used with a forward declaration of a static initialized structure. Standard C allows the forward declaration with static, and we've decided to stop catering to broken C compilers. (In fact, we expect that the compilers are all fixed eight years later.) I'm leaving staticforward and statichere defined in object.h as static. This is only for backwards compatibility with C extensions that might still use it. XXX I haven't updated the documentation.	2002-07-17 16:30:39 +00:00
Neal Norwitz	35fc7606f0	SF #561244 Micro optimizations Convert loops to memset()s.	2002-06-13 21:11:11 +00:00
Neal Norwitz	bb2769f580	Revert use of METH_OLDARGS (use 0) to support 1.5.2	2002-03-31 15:46:00 +00:00
Neal Norwitz	b049325e92	Use symbolic METH_VARARGS/METH_OLDARGS instead of 1/0 for ml_flags	2002-03-31 14:44:22 +00:00
Fredrik Lundh	82b230732f	bug #133283 , #477728 , #483789 , #490573 backed out of broken minimal repeat patch from July also fixed a couple of minor potential resource leaks in pattern_subx (Guido had already fixed the big one)	2001-12-09 16:13:15 +00:00
Guido van Rossum	146483964e	Patch supplied by Burton Radons for his own SF bug #487390 : Modifying type.__module__ behavior. This adds the module name and a dot in front of the type name in every type object initializer, except for built-in types (and those that already had this). Note that it touches lots of Mac modules -- I have no way to test these but the changes look right. Apologies if they're not. This also touches the weakref docs, which contains a sample type object initializer. It also touches the mmap test output, because the mmap type's repr is included in that output. It touches object.h to put the correct description in a comment.	2001-12-08 18:02:58 +00:00
Guido van Rossum	4e173846c8	Fix for #489672 (Neil Norwitz): memory leak in test_sre. (At least for the repeatable test case that Tim produced.) pattern_subx(): Add missing DECREF(filter) in both exit branches (normal and error return). Also fix a DECREF(args) that should certainly be a DECREF(match) -- because it's inside if (!args) and right after allocation of match.	2001-12-07 04:25:10 +00:00
Fredrik Lundh	703ce8122c	(experimental) "finditer" method/function. this works pretty much like findall, but returns an iterator (which returns match objects) instead of a list of strings/tuples.	2001-10-24 22:16:30 +00:00
Fredrik Lundh	6de22ef677	another major speedup: let sre.sub/subn check for escapes in the template string, and don't call the template compiler if we can avoid it.	2001-10-22 21:18:08 +00:00
Fredrik Lundh	f864aa8fd9	sre.split should return the last segment, even if empty (sorry, barry)	2001-10-22 06:01:56 +00:00
Fredrik Lundh	dac58492aa	fixed character set description in docstring (SRE uses Python strings, not C strings) removed USE_PYTHON defines, and related sre.py helpers skip calling the subx helper if the template is callable. interestingly enough, this means that def callback(m): return literal result = pattern.sub(callback, string) is much faster than result = pattern.sub(literal, string)	2001-10-21 21:48:30 +00:00
Fredrik Lundh	1296a8d77e	sre.Scanner fixes (from Greg Chapman). also added a Scanner sanity check to the test suite. added a few missing exception checks in the _sre module	2001-10-21 18:04:11 +00:00
Fredrik Lundh	bec95b9d88	rewrote the pattern.sub and pattern.subn methods in C removed (conceptually flawed) getliteral helper; the new sub/subn code uses a faster code path for literal replacement strings, but doesn't (yet) look for literal patterns. added STATE_OFFSET macro, and use it to convert state.start/ptr to char indexes	2001-10-21 16:47:57 +00:00
Fredrik Lundh	971e78b55b	rewrote the pattern.split method in C also restored SRE Unicode support for 1.6/2.0/2.1	2001-10-20 17:48:46 +00:00
Fredrik Lundh	397a654791	SRE bug #441409 : compile should raise error for non-strings SRE bug #432570, 448951: reset group after failed match also bumped version number to 2.2.0	2001-10-18 19:30:16 +00:00
Fredrik Lundh	59b68656f8	fixed #449964 : sre.sub raises an exception if the template contains a \g<x> group reference followed by a character escape (also restructured a few things on the way to fixing #449000)	2001-09-18 20:55:24 +00:00
Fredrik Lundh	21009b9c6f	an SRE bugfix a day keeps Guido away... #462270: sub-tle difference between pre.sub and sre.sub. PRE ignored an empty match at the previous location, SRE didn't. also synced with Secret Labs "sreopen" codebase.	2001-09-18 18:47:09 +00:00
Sjoerd Mullender	89dfe9e292	Removed unreachable return to silence SGI compiler.	2001-08-30 14:37:07 +00:00
Martin v. Löwis	339d0f720e	Patch #445762 : Support --disable-unicode - Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled - check for Py_USING_UNICODE in all places that use Unicode functions - disables unicode literals, and the builtin functions - add the types.StringTypes list - remove Unicode literals from most tests.	2001-08-17 18:39:25 +00:00
Barry Warsaw	214a0b1382	init_sre(): Plug a little leak reported by Insure.	2001-08-16 20:33:48 +00:00
Fredrik Lundh	2d96f11d07	map re.sub() to string.replace(), when possible	2001-07-08 13:26:57 +00:00
Fredrik Lundh	d89a2e7731	bug #416670 added copy/deepcopy support to SRE (still not enabled, since it's not covered by the test suite)	2001-07-03 20:32:36 +00:00
Fredrik Lundh	df781e6a3f	reapplied darryl gallion's minimizing repeat fix. I'm still not 100% sure about this one, but test #133283 now works even with the fix in place, and so does the test suite. we'll see what comes up...	2001-07-02 19:54:28 +00:00
Fredrik Lundh	f71ae461bf	pythonware repository roundtrip (untabification)	2001-07-02 17:04:48 +00:00
Fredrik Lundh	19af43d78a	added martin's BIGCHARSET patch to SRE 2.1.1. martin reports 2x speedups for certain unicode character ranges.	2001-07-02 16:58:38 +00:00
Fredrik Lundh	b0f05bdfd3	merged with pythonware's SRE 2.1.1 codebase	2001-07-02 16:42:49 +00:00
Fredrik Lundh	9c7eab82b3	SRE: made "copyright" string static, to avoid potential linking conflicts.	2001-04-15 19:00:58 +00:00
Fredrik Lundh	b25e1ad253	sre 2.1b2 update: - take locale into account for word boundary anchors (#410271) - restored 2.0's *? behaviour (#233283, #408936 and others) - speed up re.sub/re.subn	2001-03-22 15:50:10 +00:00
Tim Peters	5687ffe0c5	SF patch 404928: Support for next Cygwin gcc (2.95.2-8)	2001-02-28 16:44:18 +00:00
Fredrik Lundh	1c5aa6901f	bumped SRE version number to 2.1. cleaned up and added 1.5.2 compatibility patches.	2001-01-16 07:37:30 +00:00
Fredrik Lundh	6f5cba68fc	fixed a memory leak in pattern cleanup (patch #103248 by cgw)	2001-01-16 07:05:29 +00:00
Fredrik Lundh	b35ffc0417	added "magic" number to the _sre module, to avoid weird errors caused by compiler/engine mismatches	2001-01-15 12:46:09 +00:00
Fredrik Lundh	fa25a7d51f	-- don't use recursion for unbounded non-greedy repeat (bugs #115903, #115696) This is based on a patch by Darrel Gallion. I'm not 100% sure about this fix, but I haven't managed to come up with any test case it cannot handle...	2001-01-14 23:55:55 +00:00
Fredrik Lundh	770617b23e	SRE fixes for 2.1 alpha: -- added some more docstrings -- fixed typo in scanner class (#125531) -- the multiline flag (?m) should't affect the \Z operator (#127259) -- fixed non-greedy backtracking bug (#123769, #127259) -- added sre.DEBUG flag (currently dumps the parsed pattern structure) -- fixed a couple of glitches in groupdict (the #126587 memory leak had already been fixed by AMK)	2001-01-14 15:06:11 +00:00
Andrew M. Kuchling	48f224c877	Fix bug 126587: matchobject.groupdict() leaks memory because of a missing DECREF	2000-12-22 14:39:10 +00:00
Fredrik Lundh	ebc37b28fa	-- properly reset groups in findall (bug #117612 ) -- fixed negative lookbehind to work correctly at the beginning of the target string (bug #117242) -- improved syntax check; you can no longer refer to a group inside itself (bug #110866)	2000-10-28 19:30:41 +00:00
Fredrik Lundh	562586eb3a	Accept keyword arguments for (most) pattern and match object methods. Closes buglet #115845.	2000-10-03 20:43:34 +00:00
Fredrik Lundh	65d4bc616a	Fixed negative lookahead/lookbehind. Closes bug #115618 .	2000-10-03 16:29:23 +00:00
Fred Drake	d5fadf75e4	Rationalize use of limits.h, moving the inclusion to Python.h. Add definitions of INT_MAX and LONG_MAX to pyport.h. Remove includes of limits.h and conditional definitions of INT_MAX and LONG_MAX elsewhere. This closes SourceForge patch #101659 and bug #115323.	2000-09-26 05:46:01 +00:00
Fredrik Lundh	5644b7fad1	- fixed yet another gcc -pedantic warning - added experimental "expand" method to match objects - don't use the buffer interface on unicode strings	2000-09-21 17:03:25 +00:00
Fredrik Lundh	510c97ba2f	return -1 for undefined groups (as implemented in 1.5.2) instead of None (as documented) from start/end/span. closes bug #113254	2000-09-02 16:36:57 +00:00
Fredrik Lundh	e67d8e514f	oops. accidentally reintroduced a memory leak. put the bugfix back.	2000-08-27 21:32:46 +00:00
Fredrik Lundh	33accc1f5c	don't mistake memory errors (including reaching the recursion limit) with success. also, check return values from the mark functions. this addresses (but doesn't really solve) bug #112693, and low-memory problems reported by jack jansen.	2000-08-27 20:59:47 +00:00
Barry Warsaw	152fbe88e9	pattern_findall(): Plug small memory leak discovered by Insure. PyList_Append() always incref's the inserted item. Be sure to decref it regardless of whether the append succeeds or fails.	2000-08-18 05:09:50 +00:00
Trent Mick	239548f37d	The sre test suite currently overruns the stack on Win64, Linux64, and Monterey (64-bit AIX) This is because the RECURSION_LIMIT is too low. This patch lowers to recusion limit to 7500 such that the recusion check fires before a segfault. Fredrik suggested/approved the fix in private email, modulo sre's recusion limit checking no being necessary when PyOS_CheckStack is implemented for Windows.	2000-08-16 22:29:55 +00:00
Fredrik Lundh	5810064476	-- changed findall to return empty strings instead of None for undefined groups	2000-08-09 09:14:35 +00:00
Jack Jansen	0d15908629	Added a missing } in the USE_STACKCHECK code.	2000-08-07 21:02:50 +00:00
Fredrik Lundh	7898c3e685	-- reset marks if repeat_one tail doesn't match (this should fix Sjoerd's xmllib problem) -- added skip field to INFO header -- changed compiler to generate charset INFO header -- changed trace messages to support post-mortem analysis	2000-08-07 20:59:04 +00:00
Fredrik Lundh	18c2aa25a1	+ if USE_STACKCHECK is defined, use PyOS_CheckStack to look for excessive recursion.	2000-08-07 17:33:38 +00:00
Fredrik Lundh	96ab46529b	-- added recursion limit (currently ~10,000 levels) -- improved error messages -- factored out SRE_COUNT; the same code is used by SRE_OP_REPEAT_ONE_TEMPLATE -- minor cleanups	2000-08-03 16:29:50 +00:00
Fredrik Lundh	e186983842	final 0.9.8 updates: -- added REPEAT_ONE operator -- added ANY_ALL operator (used to represent "(?s).")	2000-08-01 22:47:49 +00:00
Fredrik Lundh	2f2c67d7e5	-- fixed width calculations for alternations -- fixed literal check in branch operator (this broke test_tokenize, as reported by Mark Favas) -- added REPEAT_ONE operator (still not enabled, though) -- added some debugging stuff (maxlevel)	2000-08-01 21:05:41 +00:00
Fredrik Lundh	29c4ba9ada	SRE 0.9.8: passes the entire test suite -- reverted REPEAT operator to use "repeat context" strategy (from 0.8.X), but done right this time. -- got rid of backtracking stack; use nested SRE_MATCH calls instead (should probably put it back again in 0.9.9 ;-) -- properly reset state in scanner mode -- don't use aggressive inlining by default	2000-08-01 18:20:07 +00:00
Fredrik Lundh	8a3ebf8ca8	-- SRE 0.9.6 sync. this includes: + added "regs" attribute + fixed "pos" and "endpos" attributes + reset "lastindex" and "lastgroup" in scanner methods + removed (?P#id) syntax; the "lastindex" and "lastgroup" attributes are now always set + removed string module dependencies in sre_parse + better debugging support in sre_parse + various tweaks to build under 1.5.2	2000-07-23 21:46:17 +00:00
Thomas Wouters	f3f33dcf03	Bunch of minor ANSIfications: 'void initfunc()' -> 'void initfunc(void)', and a couple of functions that were missed in the previous batches. Not terribly tested, but very carefully scrutinized, three times. All these were found by the little findkrc.py that I posted to python-dev, which means there might be more lurking. Cases such as this: long func(a, b) long a; long b; /* flagword */ { and other cases where the last ; in the argument list isn't followed by a newline and an opening curly bracket. Regexps to catch all are welcome, of course ;)	2000-07-21 06:00:07 +00:00
Jeremy Hylton	03657cfdb0	replace PyXXX_Length calls with PyXXX_Size calls	2000-07-12 13:05:33 +00:00
Fredrik Lundh	2855290b84	maintenance release: - reorganized some code to get rid of -Wall and -W4 warnings - fixed default argument handling for sub/subn/split methods (reported by Peter Schneider-Kamp).	2000-07-05 21:14:16 +00:00
Fredrik Lundh	72b82ba16d	- fixed grouping error bug - changed "group" operator to "groupref"	2000-07-03 21:31:48 +00:00
Fredrik Lundh	6f01398236	- added lookbehind support (?<=pattern), (?<!pattern). the pattern must have a fixed width. - got rid of array-module dependencies; the match pro- gram is now stored inside the pattern object, rather than in an extra string buffer. - cleaned up a various of potential leaks, api abuses, and other minors in the engine module. - use mal's new isalnum macro, rather than my own work- around. - untabified test_sre.py. seems like I removed a couple of trailing spaces in the process...	2000-07-03 18:44:21 +00:00
Fredrik Lundh	c2301730b8	- experimental: added two new attributes to the match object: "lastgroup" is the name of the last matched capturing group, "lastindex" is the index of the same group. if no group was matched, both attributes are set to None. the (?P#) feature will be removed in the next relase.	2000-07-02 22:25:39 +00:00
Fredrik Lundh	7cafe4d7e4	- actually enabled charset anchors in the engine (still not used by the code generator) - changed max repeat value in engine (to match earlier array fix) - added experimental "which part matched?" mechanism to sre; see http://hem.passagen.se/eff/2000_07_01_bot-archive.htm#416954 or python-dev for details.	2000-07-02 17:33:27 +00:00
Fredrik Lundh	3562f11764	-- use charset bitmaps where appropriate. this gives a 5-10% speedup for some tests, including the python tokenizer. -- added support for an optional charset anchor to the engine (currently unused by the code generator). -- removed workaround for array module bug.	2000-07-02 12:00:07 +00:00
Fredrik Lundh	c13222cdff	- fixed "{ in any other context" bug - minor comment touchups in the C module	2000-07-01 23:49:14 +00:00
Fredrik Lundh	22d2546520	today's SRE update: -- changed 1.6 to 2.0 in the file headers -- fixed ISALNUM macro for the unicode locale. this solution isn't perfect, but the best I can do with Python's current unicode database.	2000-07-01 17:50:59 +00:00
Fredrik Lundh	ef34bd2c0d	-- changed $ to match before a trailing newline, even if the multiline flag isn't given.	2000-06-30 21:40:20 +00:00
Fredrik Lundh	0640e1161f	the mad patcher strikes again: -- added pickling support (only works if sre is imported) -- fixed wordsize problems in engine (instead of casting literals down to the character size, cast characters up to the literal size (same as the code word size). this prevents false hits when you're matching a unicode pattern against an 8-bit string. (unfortunately, this broke another test, but I think the test should be changed in this case; more on that on python-dev) -- added sre.purge function (unofficial, clears the cache)	2000-06-30 13:55:15 +00:00
Fredrik Lundh	43b3b49b5a	- fixed lookahead assertions (#10 , #11 , #12 ) - untabified sre_constants.py	2000-06-30 10:41:31 +00:00
Fredrik Lundh	df02d0b3f0	- fixed default value handling in group/groupdict - added test suite	2000-06-30 07:08:20 +00:00
Fredrik Lundh	01016fe972	- fixed split behaviour on empty matches - fixed compiler problems when using locale/unicode flags - fixed group/octal code parsing in sub/subn templates	2000-06-30 00:27:46 +00:00
Fredrik Lundh	29c08beab0	still trying to figure out how to fix the remaining group reset problem. in the meantime, I added some optimizations: - added "inline" directive to LOCAL (this assumes that AC_C_INLINE does what it's supposed to do). to compile SRE on a non-unix platform that doesn't support inline, you have to add a "#define inline" somewhere... - added code to generate a SRE_OP_INFO primitive - added code to do fast prefix search (enabled by the USE_FAST_SEARCH define; default is on, in this release)	2000-06-29 23:33:12 +00:00
Fredrik Lundh	8094611eb8	- fixed another split problem (those semantics are weird...) - got rid of $Id$'s (for the moment, at least). in other words, there should be no more "empty" checkins. - internal: some minor cleanups.	2000-06-29 18:03:25 +00:00
Fredrik Lundh	be2211e940	- fixed split (test_sre still complains about split, but that's caused by the group reset bug, not split itself) - added more mark slots (should be dynamically allocated, but 100 is better than 32. and checking for the upper limit is better than overwriting the memory ;-) - internal: renamed the cursor helper class - internal: removed some bloat from sre_compile	2000-06-29 16:57:40 +00:00
Fredrik Lundh	b389df3402	- renamed "tolower" hook (it happened to work with my compiler, but not on guido's box...)	2000-06-29 12:48:37 +00:00
Fredrik Lundh	75f2d675ed	- last patch broke parse_template; fixed by changing some tests in sre_patch back to previous version - fixed return value from findall - renamed a bunch of functions inside _sre (way too many leading underscores...) </F>	2000-06-29 11:34:28 +00:00
Fredrik Lundh	6c68dc7b1a	- removed "alpha only" licensing restriction - removed some hacks that worked around 1.6 alpha bugs - removed bogus test code from sre_parse	2000-06-29 10:34:56 +00:00
Fredrik Lundh	436c3d58a2	towards 1.6b1	2000-06-29 08:58:44 +00:00
Jeremy Hylton	b1aa19515f	Fredrik Lundh: here's the 96.6% version of SRE	2000-06-01 17:39:12 +00:00
Guido van Rossum	b18618dab7	Vladimir Marangozov's long-awaited malloc restructuring. For more comments, read the patches@python.org archives. For documentation read the comments in mymalloc.h and objimpl.h. (This is not exactly what Vladimir posted to the patches list; I've made a few changes, and Vladimir sent me a fix in private email for a problem that only occurs in debug mode. I'm also holding back on his change to main.c, which seems unnecessary to me.)	2000-05-03 23:44:39 +00:00
Guido van Rossum	29530886af	Remove CRLF line endings. Fredrik Lundh: add two missing casts.	2000-04-10 17:06:55 +00:00
Guido van Rossum	b700df9824	Adding Fredrik Lundh's _sre.c module and its header files. NOTE: THIS IS VERY ROUGH ALPHA CODE!	2000-03-31 14:59:30 +00:00

... 2 3 4 5 6

291 Commits