Commit Graph

333 Commits

Author SHA1 Message Date
Eric Smith a9f7d62480 Backport of PEP 3101, Advanced String Formatting, from py3k.
Highlights:
 - Adding PyObject_Format.
 - Adding string.Format class.
 - Adding __format__ for str, unicode, int, long, float, datetime.
 - Adding builtin format.
 - Adding ''.format and u''.format.
 - str/unicode fixups for formatters.

The files in Objects/stringlib that implement PEP 3101 (stringdefs.h,
unicodedefs.h, formatter.h, string_format.h) are identical in trunk
and py3k.  Any changes from here on should be made to trunk, and
changes will propogate to py3k).
2008-02-17 19:46:49 +00:00
Christian Heimes e93237dfcc #1629: Renamed Py_Size, Py_Type and Py_Refcnt to Py_SIZE, Py_TYPE and Py_REFCNT. Macros for b/w compatibility are available. 2007-12-19 02:37:44 +00:00
Skip Montanaro 26015494f2 When splitting, avoid making a copy of the string if the split doesn't find
anything (issue 1538).
2007-12-08 15:33:24 +00:00
Facundo Batista 57d5669f4b Now in find, rfind, index, and rindex, you can use None as defaults,
as usual with slicing (both with str and unicode strings).  This
fixes issue 1259.

For str only the stringobject.c file was modified.  But for unicode,
I needed to repeat in the four functions a lot of code, so created
a new function that does part of the job for them (and placed it in
find.h, following a suggestion of Barry).

Also added tests for this behaviour.
2007-11-16 18:04:14 +00:00
Guido van Rossum 9b847b432c Add missing "return NULL" in overflow check in PyObject_Repr(). 2007-11-06 23:32:56 +00:00
Guido van Rossum 1c1ac38157 Backport fixes for the code that decodes octal escapes (and for PyString
also hex escapes) -- this was reaching beyond the end of the input string
buffer, even though it is not supposed to be \0-terminated.
This has no visible effect but is clearly the correct thing to do.
(In 3.0 it had a visible effect after removing ob_sstate from PyString.)
2007-10-29 22:15:05 +00:00
Brett Cannon 0153159e67 Add a bunch of GIL release/acquire points in tp_print implementations and for
PyObject_Print().

Closes issue #1164.
2007-09-17 03:28:34 +00:00
Thomas Wouters 3ccec68a05 Improve extended slicing support in builtin types and classes. Specifically:
- Specialcase extended slices that amount to a shallow copy the same way as
   is done for simple slices, in the tuple, string and unicode case.

 - Specialcase step-1 extended slices to optimize the common case for all
   involved types.

 - For lists, allow extended slice assignment of differing lengths as long
   as the step is 1. (Previously, 'l[:2:1] = []' failed even though
   'l[:2] = []' and 'l[:2:None] = []' do not.)

 - Implement extended slicing for buffer, array, structseq, mmap and
   UserString.UserString.

 - Implement slice-object support (but not non-step-1 slice assignment) for
   UserString.MutableString.

 - Add tests for all new functionality.
2007-08-28 15:28:19 +00:00
Georg Brandl 9efd9b6fa4 Bug #1763149: use proper slice syntax in docstring.
(backport)
2007-07-29 17:38:35 +00:00
Martin v. Löwis 6819210b9e PEP 3123: Provide forward compatibility with Python 3.0, while keeping
backwards compatibility. Add Py_Refcnt, Py_Type, Py_Size, and
PyVarObject_HEAD_INIT.
2007-07-21 06:55:02 +00:00
Georg Brandl 7c3b50db66 Patch #1673759: add a missing overflow check when formatting floats
with %G.
2007-07-12 08:38:00 +00:00
Neal Norwitz 5c9a81a3d8 Fix a bug when there was a newline in the string expandtabs was called on.
This also catches another condition that can overflow.

Will backport.
2007-06-11 02:16:10 +00:00
Neal Norwitz 7dbd2a3720 Prevent expandtabs() on string and unicode objects from causing a segfault when
a large width is passed on 32-bit platforms.  Found by Google.

It would be good for people to review this especially carefully and verify
I don't have an off by one error and there is no other way to cause overflow.
2007-06-09 03:36:34 +00:00
Raymond Hettinger 4db5fe970c SF 1193128: Let str.translate(None) be an identity transformation 2007-04-12 04:10:00 +00:00
Georg Brandl 10a4b0e6df Backport from Py3k branch: fix refleak in PyString_Format. 2007-02-26 13:51:29 +00:00
Neal Norwitz ee3a1b5244 Variation of patch # 1624059 to speed up checking if an object is a subclass
of some of the common builtin types.

Use a bit in tp_flags for each common builtin type.  Check the bit
to determine if any instance is a subclass of these common types.
The check avoids a function call and O(n) search of the base classes.
The check is done in the various Py*_Check macros rather than calling
PyType_IsSubtype().

All the bits are set in tp_flags when the type is declared
in the Objects/*object.c files because PyType_Ready() is not called
for all the types.  Should PyType_Ready() be called for all types?
If so and the change is made, the changes to the Objects/*object.c files
can be reverted (remove setting the tp_flags).  Objects/typeobject.c
would also have to be modified to add conditions
for Py*_CheckExact() in addition to each the PyType_IsSubtype check.
2007-02-25 19:44:48 +00:00
Neal Norwitz 7218c2d2f4 Whitespace only changes 2007-02-25 15:53:36 +00:00
Neal Norwitz 1c1a1c5aa1 Add more details when releasing interned strings 2007-02-25 15:52:27 +00:00
Georg Brandl 283a1353a0 Patch [ 1586791 ] better error msgs for some TypeErrors 2006-11-19 08:48:30 +00:00
Armin Rigo 7ccbca93a2 Forward-port of r52136,52138: a review of overflow-detecting code.
* unified the way intobject, longobject and mystrtoul handle
  values around -sys.maxint-1.

* in general, trying to entierely avoid overflows in any computation
  involving signed ints or longs is extremely involved.  Fixed a few
  simple cases where a compiler might be too clever (but that's all
  guesswork).

* more overflow checks against bad data in marshal.c.

* 2.5 specific: fixed a number of places that were still confusing int
  and Py_ssize_t.  Some of them could potentially have caused
  "real-world" breakage.

* list.pop(x): fixing overflow issues on x was messy.  I just reverted
  to PyArg_ParseTuple("n"), which does the right thing.  (An obscure
  test was trying to give a Decimal to list.pop()... doesn't make
  sense any more IMHO)

* trying to write a few tests...
2006-10-04 12:17:45 +00:00
Raymond Hettinger a0c95fa4d8 Fix endcase for str.rpartition() 2006-09-04 15:32:48 +00:00
Georg Brandl 26a07b5198 Fix refleak introduced in rev. 51248. 2006-08-14 20:25:39 +00:00
Neal Norwitz 56423e5762 Fix segfault when doing string formatting on subclasses of long if
__oct__, __hex__ don't return a string.

Klocwork 308
2006-08-13 18:11:08 +00:00
Neal Norwitz 8a87f5d37e Patch #1538606, Patch to fix __index__() clipping.
I modified this patch some by fixing style, some error checking, and adding
XXX comments.  This patch requires review and some changes are to be expected.
I'm checking in now to get the greatest possible review and establish a
baseline for moving forward.  I don't want this to hold up release if possible.
2006-08-12 17:03:09 +00:00
Neal Norwitz a7edb11122 Whitespace normalization 2006-07-30 06:59:13 +00:00
Neal Norwitz f71ec5a0ac Bug #1515471: string.replace() accepts character buffers again.
Pass the char* and size around rather than PyObject's.
2006-07-30 06:57:04 +00:00
Neal Norwitz 8e6675a7dc Update doc to make it agree with code.
Bottom factor out some common code.
2006-06-11 05:47:14 +00:00
Georg Brandl 90e27d38f5 Apply perky's fix for #1503157: "/".join([u"", u""]) raising OverflowError.
Also improve error message on overflow.
2006-06-10 06:40:50 +00:00
Georg Brandl 242508160e RFE #1491485: str/unicode.endswith()/startswith() now accept a tuple as first argument. 2006-06-09 18:45:48 +00:00
Neal Norwitz b16e4e7860 Remove ; at end of macro. There was a compiler recently that warned
about extra semi-colons.  It may have been the HP C compiler.
This file will trigger a bunch of those warnings now.
2006-06-01 05:32:49 +00:00
Fredrik Lundh 80f8e80c15 needforspeed: added Py_MEMCPY macro (currently tuned for Visual C only),
and use it for string copy operations.  this gives a 20% speedup on some
string benchmarks.
2006-05-28 12:06:46 +00:00
Fredrik Lundh 0b7ef46950 needforspeed: stringlib refactoring: use find_slice for stringobject 2006-05-27 15:26:19 +00:00
Fredrik Lundh c2d29c5a6d needforspeed: replace improvements, changed to Py_LOCAL_INLINE
where appropriate
2006-05-27 14:58:20 +00:00
Andrew Dalke d49d5c49ba cleanup - removed trailing whitespace 2006-05-27 14:16:40 +00:00
Fredrik Lundh 2d23d5bf2e needforspeed: more stringlib refactoring 2006-05-27 10:05:10 +00:00
Andrew Dalke 7e0a62ea90 Added description of why splitlines doesn't use the prealloc strategy 2006-05-26 22:49:03 +00:00
Andrew Dalke 5132407868 Added limits to the replace code so it does not count all of the matching
patterns in a string, only the number needed by the max limit.
2006-05-26 20:25:22 +00:00
Fredrik Lundh e6e43c867d needforspeed: stringlib refactoring: use stringlib/find for string find 2006-05-26 19:48:07 +00:00
Fredrik Lundh 58b5e84d52 needforspeed: stringlib refactoring, continued. added count and
find helpers; updated unicodeobject to use stringlib_count
2006-05-26 19:24:53 +00:00
Andrew Dalke c5da53ba78 substring split now uses /F's fast string matching algorithm.
(If compiled without FAST search support, changed the pre-memcmp test
   to check the last character as well as the first.  This gave a 25%
   speedup for my test case.)

Rewrote the split algorithms so they stop when maxsplit gets to 0.
Previously they did a string match first then checked if the maxsplit
was reached.  The new way prevents a needless string search.
2006-05-26 19:02:09 +00:00
Fredrik Lundh b3167cbcd7 needforspeed: added rpartition implementation 2006-05-26 18:15:38 +00:00
Fredrik Lundh 3a65d87e8c needforspeed: remove remaining USE_FAST macros; if fastsearch was
broken, someone would have noticed by now ;-)
2006-05-26 17:31:41 +00:00
Fredrik Lundh c2032fb86a needforspeed: cleanup 2006-05-26 17:26:39 +00:00
Fredrik Lundh b947948c61 needforspeed: stringlib refactoring (in progress) 2006-05-26 17:22:38 +00:00
Fredrik Lundh a50d201bd9 needforspeed: stringlib refactoring (in progress) 2006-05-26 17:04:58 +00:00
Fredrik Lundh 7c940d1d68 needforspeed: use Py_LOCAL on a few more locals in stringobject.c 2006-05-26 16:32:42 +00:00
Andrew Dalke 02758d66ce Eeked out another 3% or so performance in split whitespace by cleaning up the algorithm. 2006-05-26 15:21:01 +00:00
Andrew Dalke 525eab3712 Changes to string.split/rsplit on whitespace to preallocate space in the
results list.

Originally it allocated 0 items and used the list growth during append.  Now
it preallocates 12 items so the first few appends don't need list reallocs.

("Here are some words ."*2).split(None, 1) is 7% faster
("Here are some words ."*2).split() is is 15% faster

  (Your milage may vary, see dealership for details.)

File parsing like this

    for line in f:
        count += len(line.split())

is also about 15% faster.  There is a slowdown of about 3% for large
strings because of the additional overhead of checking if the append is
to a preallocated region of the list or not.  This will be the rare case.
It could be improved with special case code but we decided it was not
useful enough.

There is a cost of 12*sizeof(PyObject *) bytes per list.  For the normal
case of file parsing this is not a problem because of the lists have
a short lifetime.  We have not come up with cases where this is a problem
in real life.

I chose 12 because human text averages about 11 words per line in books,
one of my data sets averages 6.2 words with a final peak at 11 words per
line, and I work with a tab delimited data set with 8 tabs per line (or
9 words per line).  12 encompasses all of these.

Also changed the last rstrip code to append then reverse, rather than
doing insert(0).  The strip() and rstrip() times are now comparable.
2006-05-26 14:00:45 +00:00
Fredrik Lundh 95e2a91615 use Py_LOCAL also for string and unicode objects 2006-05-26 11:38:15 +00:00
Fredrik Lundh f2c0dfdb13 needforspeed: use Py_ssize_t for the fastsearch counter and skip
length (thanks, neal!).  and yes, I've verified that this doesn't
slow things down ;-)
2006-05-26 10:27:17 +00:00