Commit Graph

48 Commits

Author SHA1 Message Date
Thomas Wouters 49fd7fa443 Merge p3yk branch with the trunk up to revision 45595. This breaks a fair
number of tests, all because of the codecs/_multibytecodecs issue described
here (it's not a Py3K issue, just something Py3K discovers):
http://mail.python.org/pipermail/python-dev/2006-April/064051.html

Hye-Shik Chang promised to look for a fix, so no need to fix it here. The
tests that are expected to break are:

test_codecencodings_cn
test_codecencodings_hk
test_codecencodings_jp
test_codecencodings_kr
test_codecencodings_tw
test_codecs
test_multibytecodec

This merge fixes an actual test failure (test_weakref) in this branch,
though, so I believe merging is the right thing to do anyway.
2006-04-21 10:40:58 +00:00
Thomas Wouters 9ada3d6e29 Merge trunk up to 43069, putting re.py back and hopefully making the branch
usable again.
2006-04-21 09:47:09 +00:00
Raymond Hettinger ebb7f30111 Speed-up escape() 2005-09-12 22:50:37 +00:00
Raymond Hettinger 596ba4d89e Granted Noam Raphael's request for minor improvements to the re module and
its documentation.

* Documented that the compiled re methods are supposed to be more full
  featured than their simpilified function counterparts.

* Documented the existing start and stop position arguments for the
  findall() and finditer() methods of compiled regular expression objects.

* Added an optional flags argument to the re.findall() and re.finditer()
  functions.  This aligns their API with that for re.search() and
  re.match().
2004-09-24 03:41:05 +00:00
Barry Warsaw 8bee76106e PEP 292 classes Template and SafeTemplate are added to the string module.
This patch includes test cases and documentation updates, as well as NEWS file
updates.

This patch also updates the sre modules so that they don't import the string
module, breaking direct circular imports.
2004-08-25 02:22:30 +00:00
Andrew M. Kuchling 43ab0cd174 [Bug #990792] Mention that repl can be a callable 2004-08-07 17:41:54 +00:00
Hye-Shik Chang 0f5bf1ebdd SF #926075: Fixed the bug that returns a wrong pattern object for
a string or unicode object in sre.compile() when a different type
pattern with the same value exists.
2004-04-20 21:11:11 +00:00
Just van Rossum 74902508dc Addendum to #764548: restore 2.1 compatibility. 2003-07-02 21:37:16 +00:00
Just van Rossum 12723bacea Fix and test for bug #764548:
Use isinstance() instead of comparing types directly, to enable
subclasses of str and unicode to be used as patterns.
Blessed by /F.
2003-07-02 20:03:04 +00:00
Guido van Rossum 0d976551fb Add finditer to __all__ (when defining it at all).
SF bug 585882.  Will forward-port.
2002-10-14 12:22:17 +00:00
Fredrik Lundh b7747e2a2d added finditer sanity check 2001-10-28 20:15:40 +00:00
Fredrik Lundh 703ce8122c (experimental) "finditer" method/function. this works pretty much
like findall, but returns an iterator (which returns match objects)
instead of a list of strings/tuples.
2001-10-24 22:16:30 +00:00
Fredrik Lundh dac58492aa fixed character set description in docstring (SRE uses Python
strings, not C strings)

removed USE_PYTHON defines, and related sre.py helpers

skip calling the subx helper if the template is callable.
interestingly enough, this means that

	def callback(m):
	    return literal
	result = pattern.sub(callback, string)

is much faster than

	result = pattern.sub(literal, string)
2001-10-21 21:48:30 +00:00
Fredrik Lundh 1296a8d77e sre.Scanner fixes (from Greg Chapman). also added a Scanner sanity
check to the test suite.

added a few missing exception checks in the _sre module
2001-10-21 18:04:11 +00:00
Fredrik Lundh bec95b9d88 rewrote the pattern.sub and pattern.subn methods in C
removed (conceptually flawed) getliteral helper; the new sub/subn code
uses a faster code path for literal replacement strings, but doesn't
(yet) look for literal patterns.

added STATE_OFFSET macro, and use it to convert state.start/ptr to
char indexes
2001-10-21 16:47:57 +00:00
Fredrik Lundh 397a654791 SRE bug #441409:
compile should raise error for non-strings
SRE bug #432570, 448951:
    reset group after failed match

also bumped version number to 2.2.0
2001-10-18 19:30:16 +00:00
Fredrik Lundh 59b68656f8 fixed #449964: sre.sub raises an exception if the template contains a
\g<x> group reference followed by a character escape

(also restructured a few things on the way to fixing #449000)
2001-09-18 20:55:24 +00:00
Fredrik Lundh 21009b9c6f an SRE bugfix a day keeps Guido away...
#462270: sub-tle difference between pre.sub and sre.sub.  PRE ignored
an empty match at the previous location, SRE didn't.

also synced with Secret Labs "sreopen" codebase.
2001-09-18 18:47:09 +00:00
Fred Drake 9f5b822fb3 Convert docstring to "raw" string. 2001-09-04 19:20:06 +00:00
Fred Drake b8f2274985 Added docstrings by Neal Norwitz. This closes SF bug #450980. 2001-09-04 19:10:20 +00:00
Guido van Rossum 315cd29ecf Disable the sub() optimization until Fredrik has time to look into SF
bug #449000, "re.sub(r'\n', ...) broke".  This was Fredrik's
suggestion -- he's on vacation and said he wouldn't be able to work on
this until next week.
2001-08-10 14:56:54 +00:00
Fredrik Lundh 2d96f11d07 map re.sub() to string.replace(), when possible 2001-07-08 13:26:57 +00:00
Fredrik Lundh e06cbb8c56 bug #436596
re.findall doesn't take a maxsplit argument
2001-07-06 20:56:10 +00:00
Fredrik Lundh b25e1ad253 sre 2.1b2 update:
- take locale into account for word boundary anchors (#410271)
- restored 2.0's *? behaviour (#233283, #408936 and others)
- speed up re.sub/re.subn
2001-03-22 15:50:10 +00:00
Fredrik Lundh f2989b22ff - restored 1.5.2 compatibility (sorry, eric)
- removed __all__ cruft from internal modules (sorry, skip)
- don't assume ASCII for string escapes (sorry, per)
2001-02-18 12:05:16 +00:00
Skip Montanaro 0de65807e6 bunch more __all__ lists
also modified check_all function to suppress all warnings since they aren't
relevant to what this test is doing (allows quiet checking of regsub, for
instance)
2001-02-15 22:15:14 +00:00
Eric S. Raymond b08b2d3166 String method conversion. 2001-02-09 11:10:16 +00:00
Fredrik Lundh 1c5aa6901f bumped SRE version number to 2.1. cleaned up and added 1.5.2
compatibility patches.
2001-01-16 07:37:30 +00:00
Fredrik Lundh 770617b23e SRE fixes for 2.1 alpha:
-- added some more docstrings
-- fixed typo in scanner class (#125531)
-- the multiline flag (?m) should't affect the \Z operator (#127259)
-- fixed non-greedy backtracking bug (#123769, #127259)
-- added sre.DEBUG flag (currently dumps the parsed pattern structure)
-- fixed a couple of glitches in groupdict (the #126587 memory leak
   had already been fixed by AMK)
2001-01-14 15:06:11 +00:00
Fredrik Lundh 5644b7fad1 - fixed yet another gcc -pedantic warning
- added experimental "expand" method to match objects
- don't use the buffer interface on unicode strings
2000-09-21 17:03:25 +00:00
Fredrik Lundh 7898c3e685 -- reset marks if repeat_one tail doesn't match
(this should fix Sjoerd's xmllib problem)
-- added skip field to INFO header
-- changed compiler to generate charset INFO header
-- changed trace messages to support post-mortem analysis
2000-08-07 20:59:04 +00:00
Fredrik Lundh e186983842 final 0.9.8 updates:
-- added REPEAT_ONE operator
-- added ANY_ALL operator (used to represent "(?s).")
2000-08-01 22:47:49 +00:00
Fredrik Lundh 29c4ba9ada SRE 0.9.8: passes the entire test suite
-- reverted REPEAT operator to use "repeat context" strategy
   (from 0.8.X), but done right this time.
-- got rid of backtracking stack; use nested SRE_MATCH calls
   instead (should probably put it back again in 0.9.9 ;-)
-- properly reset state in scanner mode
-- don't use aggressive inlining by default
2000-08-01 18:20:07 +00:00
Fredrik Lundh 8a3ebf8ca8 -- SRE 0.9.6 sync. this includes:
+ added "regs" attribute
 + fixed "pos" and "endpos" attributes
 + reset "lastindex" and "lastgroup" in scanner methods
 + removed (?P#id) syntax; the "lastindex" and "lastgroup"
   attributes are now always set
 + removed string module dependencies in sre_parse
 + better debugging support in sre_parse
 + various tweaks to build under 1.5.2
2000-07-23 21:46:17 +00:00
Fredrik Lundh 019bcb598d - changed sre.Scanner to use lastindex instead of index. 2000-07-02 22:59:57 +00:00
Fredrik Lundh 7cafe4d7e4 - actually enabled charset anchors in the engine (still not
used by the code generator)

- changed max repeat value in engine (to match earlier array fix)

- added experimental "which part matched?" mechanism to sre; see
  http://hem.passagen.se/eff/2000_07_01_bot-archive.htm#416954
  or python-dev for details.
2000-07-02 17:33:27 +00:00
Fredrik Lundh 22d2546520 today's SRE update:
-- changed 1.6 to 2.0 in the file headers

-- fixed ISALNUM macro for the unicode locale.  this
   solution isn't perfect, but the best I can do with
   Python's current unicode database.
2000-07-01 17:50:59 +00:00
Fredrik Lundh 0640e1161f the mad patcher strikes again:
-- added pickling support (only works if sre is imported)

-- fixed wordsize problems in engine
   (instead of casting literals down to the character size,
   cast characters up to the literal size (same as the code
   word size).  this prevents false hits when you're matching
   a unicode pattern against an 8-bit string. (unfortunately,
   this broke another test, but I think the test should be
   changed in this case; more on that on python-dev)

-- added sre.purge function
   (unofficial, clears the cache)
2000-06-30 13:55:15 +00:00
Fredrik Lundh 90a0791322 - pedantic: make sure "python -t" doesn't complain... 2000-06-30 07:50:59 +00:00
Fredrik Lundh df02d0b3f0 - fixed default value handling in group/groupdict
- added test suite
2000-06-30 07:08:20 +00:00
Fredrik Lundh 01016fe972 - fixed split behaviour on empty matches
- fixed compiler problems when using locale/unicode flags

- fixed group/octal code parsing in sub/subn templates
2000-06-30 00:27:46 +00:00
Fredrik Lundh 8094611eb8 - fixed another split problem
(those semantics are weird...)

- got rid of $Id$'s (for the moment, at least).  in other
  words, there should be no more "empty" checkins.

- internal: some minor cleanups.
2000-06-29 18:03:25 +00:00
Fredrik Lundh be2211e940 - fixed split
(test_sre still complains about split, but that's caused by
  the group reset bug, not split itself)

- added more mark slots
  (should be dynamically allocated, but 100 is better than 32.
  and checking for the upper limit is better than overwriting
  the memory ;-)

- internal: renamed the cursor helper class

- internal: removed some bloat from sre_compile
2000-06-29 16:57:40 +00:00
Fredrik Lundh 436c3d58a2 towards 1.6b1 2000-06-29 08:58:44 +00:00
Andrew M. Kuchling e8d52af54b Fix bug when the replacement template is a callable object 2000-06-18 20:27:10 +00:00
Jeremy Hylton b1aa19515f Fredrik Lundh: here's the 96.6% version of SRE 2000-06-01 17:39:12 +00:00
Guido van Rossum 1b6aecb08c I know this is only a temporary stop-gap measure, but the match() and
search() functions didn't even work because _fixflags() isn't
idempotent.  I'm adding another stop-gap measure so that you can at
least use sre.search() and sre.match() with a zero flags arg.
2000-05-02 15:52:33 +00:00
Guido van Rossum 7627c0de69 Added Fredrik Lundh's sre module and its supporting cast.
NOTE: THIS IS VERY ROUGH ALPHA CODE!
2000-03-31 14:58:54 +00:00