cpython

Commit Graph

Author	SHA1	Message	Date
Fredrik Lundh	0c71f88fc9	needforspeed: check for overflow in replace (from Andrew Dalke)	2006-05-25 16:46:54 +00:00
Fredrik Lundh	347ee277aa	needforspeed: refactored the replace code slightly; special-case constant-length changes; use fastsearch to locate the first match.	2006-05-24 16:35:18 +00:00
Fredrik Lundh	d5e0dc51cf	needforspeedindeed: use fastsearch also for __contains__	2006-05-24 15:11:01 +00:00
Fredrik Lundh	6471ee4f18	needforspeed: use "fastsearch" for count and findstring helpers. this results in a 2.5x speedup on the stringbench count tests, and a 20x (!) speedup on the stringbench search/find/contains test, compared to 2.5a2. for more on the algorithm, see: http://effbot.org/zone/stringlib.htm if you get weird results, you can disable the new algoritm by undefining USE_FAST in Objects/unicodeobject.c. enjoy /F	2006-05-24 14:28:11 +00:00
Fredrik Lundh	240bf2a8e4	use Py_ssize_t for string indexes (thanks, neal!)	2006-05-24 10:20:36 +00:00
Fredrik Lundh	7763351808	return 0 on misses, not -1.	2006-05-23 19:47:35 +00:00
Fredrik Lundh	b63588c188	needforspeed: use append+reverse for rsplit, use "bloom filters" to speed up splitlines and strip with charsets; etc. rsplit is now as fast as split in all our tests (reverse takes no time at all), and splitlines() is nearly as fast as a plain split("\n") in our tests. and we're not done yet... ;-)	2006-05-23 18:44:25 +00:00
Fredrik Lundh	833bf9422e	needforspeed: fixed unicode "in" operator to use same implementation approach as find/index	2006-05-23 10:12:21 +00:00
Tim Peters	1bacc641a0	unicode_repeat(): Change type of local to Py_ssize_t, since that's what it should be.	2006-05-23 05:47:16 +00:00
Tim Peters	286085c781	PyUnicode_Join(): Recent code changes introduced new compiler warnings on Windows (signed vs unsigned mismatch in comparisons). Cleaned that up by switching more locals to Py_ssize_t. Simplified overflow checking (it can _be_ simpler because while these things are declared as Py_ssize_t, then should in fact never be negative).	2006-05-22 19:17:04 +00:00
Fredrik Lundh	8a8e05a2b9	needforspeed: use memcpy for "long" strings; use a better algorithm for long repeats.	2006-05-22 17:12:58 +00:00
Fredrik Lundh	f1d60a5384	needforspeed: speed up unicode repeat, unicode string copy	2006-05-22 16:29:30 +00:00
Fredrik Lundh	763b50f9d9	docstring tweaks: count counts non-overlapping substrings, not total number of occurences	2006-05-22 15:35:12 +00:00
Neal Norwitz	1004a5339a	Patch #1488312 , Fix memory alignment problem on SPARC in unicode. Will backport	2006-05-15 07:17:23 +00:00
Thomas Wouters	715a4cdea2	Use %zd instead of %i as format character (in call to PyErr_Format) for Py_ssize_t argument.	2006-04-16 22:04:49 +00:00
Martin v. Löwis	5cb6936672	Make Py_BuildValue, PyObject_CallFunction and PyObject_CallMethod aware of PY_SSIZE_T_CLEAN.	2006-04-14 09:08:42 +00:00
Martin v. Löwis	f15da6995b	Remove another INT_MAX limitation	2006-04-13 07:24:50 +00:00
Martin v. Löwis	412fb67368	Change more ints to Py_ssize_t.	2006-04-13 06:34:32 +00:00
Martin v. Löwis	80d2e591d5	Revert 34153: Py_UNICODE should not be signed.	2006-04-13 06:06:08 +00:00
Anthony Baxter	ac6bd46d5c	spread the extern "C" { } magic pixie dust around. Python itself builds now using a C++ compiler. Still lots and lots of errors in the modules built by setup.py, and a bunch of warnings from g++ in the core.	2006-04-13 02:06:09 +00:00
Anthony Baxter	a62862120d	More low-hanging fruit. Still need to re-arrange some code (or find a better solution) in the same way as listobject.c got changed. Hoping for a better solution.	2006-04-11 07:42:36 +00:00
Georg Brandl	ecdc0a9f46	That one was a mistake.	2006-03-30 12:19:07 +00:00
Georg Brandl	347b30042b	Remove unnecessary casts in type object initializers.	2006-03-30 11:57:00 +00:00
Thomas Wouters	a96affe1fc	- Reindent a confusingly indented piece of code (no intended code changes there) - Add missing DECREFs of inner-scope 'temp' variable - Add various missing DECREFs by changing 'return NULL' into 'goto onError' - Avoid double DECREF when last _PyUnicode_Resize() fails Coverity found one of the missing DECREFs, but oddly enough not the others.	2006-03-12 00:29:36 +00:00
Martin v. Löwis	480f1bb67b	Update Unicode database to Unicode 4.1.	2006-03-09 23:38:20 +00:00
Guido van Rossum	38fff8c4e4	Checking in the code for PEP 357. This was mostly written by Travis Oliphant. I've inspected it all; Neal Norwitz and MvL have also looked at it (in an earlier incarnation).	2006-03-07 18:50:55 +00:00
Hye-Shik Chang	4af5c8cee4	SF #1444030 : Fix several potential defects found by Coverity. (reviewed by Neal Norwitz)	2006-03-07 15:39:21 +00:00
Martin v. Löwis	15e62742fa	Revert backwards-incompatible const changes.	2006-02-27 16:46:16 +00:00
Thomas Wouters	de01774dae	Use correct PyArg_Parse format char for Py_ssize_t in unicode.center(). Fixes: >>> u"".center(10) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError on 64-bit systems.	2006-02-16 19:34:37 +00:00
Martin v. Löwis	eb079f1c25	Use Py_ssize_t for counts and sizes. Convert Py_ssize_t using PyInt_FromSsize_t	2006-02-16 14:32:27 +00:00
Martin v. Löwis	2c95cc6d72	Support %zd in PyErr_Format and PyString_FromFormat.	2006-02-16 06:54:25 +00:00
Tim Peters	15231548d2	doubletounicode(), longtounicode(): Py_SAFE_DOWNCAST can evaluate its first argument multiple times in a debug build. This caused two distinct assert- failures in test_unicode run under a debug build. Rewrote the code in trivial ways so that multiple evaluation of the first argument doesn't hurt.	2006-02-16 01:08:01 +00:00
Thomas Wouters	4701af5bf5	Remove two unused Py_ssize_t variables (merge glitches, looks like.)	2006-02-15 23:10:32 +00:00
Martin v. Löwis	18e165558b	Merge ssize_t branch.	2006-02-15 17:27:45 +00:00
Neal Norwitz	fc76d633e8	- Patch #1400181 , fix unicode string formatting to not use the locale. This is how string objects work. u'%f' could use , instead of . for the decimal point. Now both strings and unicode always use periods. This is the code that would break: import locale locale.setlocale(locale.LC_NUMERIC, 'de_DE') u'%.1f' % 1.0 assert '1.0' == u'%.1f' % 1.0 I couldn't create a test case which fails, but this fixes the problem. Will backport.	2006-01-10 06:03:13 +00:00
Neal Norwitz	d43069ce95	Fix icc warnings: remove (sometimes) unused variable conditionally	2006-01-08 01:12:10 +00:00
Martin v. Löwis	dea59e5755	Stop maintaining the buildno file. Also, stop determining Unicode sizes with PyString_GET_SIZE.	2006-01-05 10:00:36 +00:00
Hye-Shik Chang	835b243c71	Bug #1379994 : Fix *unicode_escape codecs to encode r'\' as r'\\' just like string codecs.	2005-12-17 04:38:31 +00:00
Jeremy Hylton	af68c874a6	Add const to several API functions that take char . In C++, it's an error to pass a string literal to a char function without a const_cast(). Rather than require every C++ extension module to put a cast around string literals, fix the API to state the const-ness. I focused on parts of the API where people usually pass literals: PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type slots, etc. Predictably, there were a large set of functions that needed to be fixed as a result of these changes. The most pervasive change was to make the keyword args list passed to PyArg_ParseTupleAndKewords() to be a const char kwlist[]. One cast was required as a result of the changes: A type object mallocs the memory for its tp_doc slot and later frees it. PyTypeObject says that tp_doc is const char ; but if the type was created by type_new(), we know it is safe to cast to char *.	2005-12-10 18:50:16 +00:00
Walter Dörwald	d4fff1731c	Fix leaked reference to None.	2005-11-28 22:15:56 +00:00
Andrew M. Kuchling	8294de5673	Another comment typo fix	2005-11-02 16:36:12 +00:00
Walter Dörwald	2e2c02fedb	Fix typo in comment.	2005-11-02 08:57:11 +00:00
Fred Drake	db390c1ad8	fix typos, mostly in comments	2005-10-28 14:39:47 +00:00
Michael W. Hudson	b2308bb9be	Fix bug: [ 1327110 ] wrong TypeError traceback in generator expressions by removing the code that can stomp on the users' TypeError raised by the iterable argument to ''.join() -- PySequence_Fast (now?) gives a perfectly reasonable message itself. Also, a couple of tests.	2005-10-21 11:45:01 +00:00
Marc-André Lemburg	5c4a9d6591	Whitespace corrections.	2005-10-19 22:39:02 +00:00
Marc-André Lemburg	e115ec832c	Bug fix for [ 1331062 ] utf 7 codec broken. Backport candidate.	2005-10-19 22:33:31 +00:00
Walter Dörwald	d1c1e10f70	Part of SF patch #1313939 : Speedup charmap decoding by extending PyUnicode_DecodeCharmap() the accept a unicode string as the mapping argument which is used as a mapping table. This code isn't used by any of the codecs yet.	2005-10-06 20:29:57 +00:00
Walter Dörwald	a47d1c08d0	SF bug #1251300 : On UCS-4 builds the "unicode-internal" codec will now complain about illegal code points. The codec now supports PEP 293 style error handlers. (This is a variant of the Nik Haldimann's patch that detects truncated data)	2005-08-30 10:23:14 +00:00
Marc-André Lemburg	a9cadcd41b	Correct the handling of 0-termination of PyUnicode_AsWideChar() and its usage in PyLocale_strcoll(). Clarify the documentation on this. Thanks to Andreas Degert for pointing this out.	2004-11-22 13:02:31 +00:00
Marc-André Lemburg	204bd6d9d2	Applied patch for [ 1047269 ] Buffer overwrite in PyUnicode_AsWideChar. Python 2.3.x candidate.	2004-10-15 07:45:05 +00:00
Skip Montanaro	6543b45b0c	Initialize sep and seplen to suppress warning from gcc.	2004-09-16 03:28:13 +00:00
Thomas Heller	ca0d2cb66e	Add a missing line continuation character.	2004-09-15 11:41:32 +00:00
Walter Dörwald	065a32f550	Make the hint about the None default less ambiguous.	2004-09-14 09:45:10 +00:00
Walter Dörwald	782afc5927	Enhance the docstrings for unicode.split() and string.split() to make it clear that it is possible to pass None as the separator argument to get the default "any whitespace" separator.	2004-09-14 09:40:45 +00:00
Walter Dörwald	69652035bc	SF patch #998993 : The UTF-8 and the UTF-16 stateful decoders now support decoding incomplete input (when the input stream is temporarily exhausted). codecs.StreamReader now implements buffering, which enables proper readline support for the UTF-16 decoders. codecs.StreamReader.read() has a new argument chars which specifies the number of characters to return. codecs.StreamReader.readline() and codecs.StreamReader.readlines() have a new argument keepends. Trailing "\n"s will be stripped from the lines if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and PyUnicode_DecodeUTF16Stateful.	2004-09-07 20:24:22 +00:00
Tim Peters	91879ab8ea	PyUnicode_Join(): Bozo Alert. While this is chugging along, it may need to convert str objects from the iterable to unicode. So, if someone set the system default encoding to something nasty enough, the conversion process could mutate the input iterable as a side effect, and PySequence_Fast doesn't hide that from us if the input was a list. IOW, can't assume the size of PySequence_Fast's result is invariant across PyUnicode_FromObject() calls.	2004-08-27 22:35:44 +00:00
Tim Peters	05eba1fdc8	PyUnicode_Join(): Rewrote to use PySequence_Fast(). This doesn't do much to reduce the size of the code, but greatly improves its clarity. It's also quicker in what's probably the most common case (the argument iterable is a list). Against it, if the iterable isn't a list or a tuple, a temp tuple is materialized containing the entire input sequence, and that's a bigger temp memory burden. Yawn.	2004-08-27 21:32:02 +00:00
Tim Peters	894c512c2f	PyUnicode_Join(): Missed a spot where I intended a cast from size_t to int. I sure wish MS would gripe about that! Whatever, note that the statement above it guarantees that the cast loses no info.	2004-08-27 05:08:36 +00:00
Tim Peters	8ce9f16259	PyUnicode_Join(): Two primary aims: 1. u1.join([u2]) is u2 2. Be more careful about C-level int overflow. Since PySequence_Fast() isn't needed to achieve #1, it's not used -- but the code could sure be simpler if it were.	2004-08-27 01:49:32 +00:00
Hye-Shik Chang	e9ddfbb412	SF #989185 : Drop unicode.iswide() and unicode.width() and add unicodedata.east_asian_width(). You can still implement your own simple width() function using it like this: def width(u): w = 0 for c in unicodedata.normalize('NFC', u): cwidth = unicodedata.east_asian_width(c) if cwidth in ('W', 'F'): w += 2 else: w += 1 return w	2004-08-04 07:38:35 +00:00
Marc-André Lemburg	d25c650461	Let u'%s' % obj try obj.__unicode__() first and fallback to obj.__str__().	2004-07-23 16:13:25 +00:00
Nicholas Bastin	9ba301e589	Moved SunPro warning suppression into pyport.h and out of individual modules and objects.	2004-07-15 15:54:05 +00:00
Marc-André Lemburg	126b44cd41	Fix a copy&paste typo.	2004-07-10 12:04:20 +00:00
Marc-André Lemburg	1dffb120b7	.encode()/.decode() patch part 2.	2004-07-08 19:13:55 +00:00
Marc-André Lemburg	d2d4598ec2	Allow string and unicode return types from .encode()/.decode() methods on string and unicode objects. Added unicode.decode() which was missing for no apparent reason.	2004-07-08 17:57:32 +00:00
Nicholas Bastin	1ce9e4cfc1	Fixed end-of-loop code not reached warning when using SunPro C	2004-06-17 18:27:18 +00:00
Hye-Shik Chang	974ed7cfa5	- SF #962502 : Add two more methods for unicode type; width() and iswide() for east asian width manipulation. (Inspired by David Goodger, Reviewed by Martin v. Loewis) - Move _PyUnicode_TypeRecord.flags to the end of the struct so that no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)	2004-06-02 16:49:17 +00:00
Hye-Shik Chang	4057483164	SF Patch #926375 : Remove a useless UTF-16 support code that is never been used. (Suggested by Martin v. Loewis)	2004-04-06 07:24:51 +00:00
Walter Dörwald	cd736e71a3	Fix reallocation bug in unicode.translate(): The code was comparing characters instead of character pointers to determine space requirements.	2004-02-05 17:36:00 +00:00
Hye-Shik Chang	1bc09b7c2a	Cosmetic fix for wrongly indented tabs with ts=4.	2004-01-03 19:35:43 +00:00
Hye-Shik Chang	7fc4cf57b8	Fix unicode.rsplit()'s bug that ignores separater on the end of string when using specialized splitter for 1 char sep.	2003-12-23 09:10:16 +00:00
Hye-Shik Chang	40e9509dc7	Fix broken xmlcharrefreplace by rev 2.204. (Pointy hat goes to perky)	2003-12-22 01:31:13 +00:00
Hye-Shik Chang	4a264fb054	SF #859573 : Reduce compiler warnings on gcc 3.2 and above.	2003-12-19 01:59:56 +00:00
Hye-Shik Chang	3ae811b57d	Add rsplit method for str and unicode builtin types. SF feature request #801847. Original patch is written by Sean Reifschneider.	2003-12-15 18:49:53 +00:00
Guido van Rossum	6c9e130524	- Removed FutureWarnings related to hex/oct literals and conversions and left shifts. (Thanks to Kalle Svensson for SF patch 849227.) This addresses most of the remaining semantic changes promised by PEP 237, except for repr() of a long, which still shows the trailing 'L'. The PEP appears to promise warnings for operations that changed semantics compared to Python 2.3, but this is not implemented; we've suffered through enough warnings related to hex/oct literals and I think it's best to be silent now.	2003-11-29 23:52:13 +00:00
Raymond Hettinger	4f8f976576	Add optional fillchar argument to ljust(), rjust(), and center() string methods.	2003-11-26 08:21:35 +00:00
Walter Dörwald	4894c30626	Fix a bug in the memory reallocation code of PyUnicode_TranslateCharmap(). charmaptranslate_makespace() allocated more memory than required for the next replacement but didn't remember that fact, so memory size was growing exponentially every time a replacement string is longer that one character. This fixes SF bug #828737.	2003-10-24 14:25:28 +00:00
Martin v. Löwis	6828e18a6a	Patch #825679 : Clarify semantics of .isfoo on empty strings. Backported to 2.3.	2003-10-18 09:55:08 +00:00
Jeremy Hylton	504de6bd2c	Fix for SF bug [ 817156 ] invalid \U escape gives 0=length unistr.	2003-10-06 05:08:26 +00:00
Tim Peters	ced69f8a20	On c.l.py, Martin v. Löwis said that Py_UNICODE could be of a signed type, so fiddle Jeremy's fix to live with that. Also added more comments. Bugfix candidate (this bug is in all versions of Python, at least since 2.1).	2003-09-16 20:30:58 +00:00
Jeremy Hylton	d808279be3	Double-fix of crash in Unicode freelist handling. If a length-1 Unicode string was in the freelist and it was uninitialized or pointed to a very large (magnitude) negative number, the check unicode_latin1[unicode->str[0]] == unicode could cause a segmentation violation, e.g. unicode->str[0] is 0xcbcbcbcb. Fix this in two ways: 1. Change guard befor unicode_latin1[] to test against 256U. If I understand correctly, the unsigned long used to store UCS4 on my box was getting converted to a signed long to compare with the signed constant 256. 2. Change _PyUnicode_New() to make sure the first element of str is always initialized to zero. There are several places in the code where the caller can exit with an error before initializing any of str, which would leave junk in str[0]. Also, silence a compiler warning on pointer vs. int arithmetic. Bug fix candidate.	2003-09-16 19:41:39 +00:00
Jeremy Hylton	deb2dc6658	Change checks of PyUnicode_Resize() return value for clarity. The unicode_resize() family only returns -1 or 0 so simply checking for != 0 is sufficient, but somewhat unclear. Many Python API functions return < 0 on error, reserving the right to return 0 or 1 on success. Change the call sites for consistency with these calls.	2003-09-16 03:41:45 +00:00
Raymond Hettinger	9bfe533c69	SF bug #795506 : Wrong handling of string format code for float values. Adding missing support for '%F'. Will backport to 2.3.1.	2003-08-27 04:55:52 +00:00
Walter Dörwald	150523efa5	Fix refcounting leak in charmaptranslate_lookup()	2003-08-15 16:52:19 +00:00
Walter Dörwald	9b30f206ee	Fix another refcounting leak in PyUnicode_EncodeCharmap().	2003-08-15 16:26:34 +00:00
Walter Dörwald	d4ade0885c	Fix another refcounting leak (in PyUnicode_DecodeUnicodeEscape()).	2003-08-15 15:00:26 +00:00
Walter Dörwald	e5402fb340	Fix refcount leak in PyUnicode_EncodeCharmap(). The bug surfaces when an encoding error occurs and the callback name is unknown, i.e. when the callback has to be called. The problem was that the fact that the callback has already been looked up was only recorded in a local variable in charmap_encoding_error(), because charmap_encoding_error() got it's own copy of the errorHandler pointer instead of a pointer to the pointer in PyUnicode_EncodeCharmap().	2003-08-14 20:25:29 +00:00
Mark Hammond	0ccda1ee10	Support 'mbcs' as a 'built-in' encoding, so the C API can use it without defering to the encodings package. As described in [ 763111 ] mbcs encoding should skip encodings package	2003-07-01 00:13:27 +00:00
Raymond Hettinger	f466793fcc	SF patch 703666: Several objects don't decref tmp on failure in subtype_new Submitted By: Christopher A. Craig Fillin some missing decrefs.	2003-06-28 20:04:25 +00:00
Martin v. Löwis	9a3a9f7791	Consider \U-escapes in raw-unicode-escape. Fixes #444514 .	2003-05-18 12:31:09 +00:00
Neal Norwitz	ffe33b7f24	Attempt to make all the various string strip methods the same. Doc - add doc for when functions were added * UserString * string object methods * string module functions 'chars' is used for the last parameter everywhere. These changes will be backported, since part of the changes have already been made, but they were inconsistent.	2003-04-10 22:35:32 +00:00
Guido van Rossum	a7132189d2	Reformat a few docstrings that caused line wraps in help() output.	2003-04-09 19:32:45 +00:00
Walter Dörwald	44f527fea4	Change formatchar(), so that u"%c" % 0xffffffff now raises an OverflowError instead of a TypeError to be consistent with "%c" % 256. See SF patch #710127.	2003-04-02 16:37:24 +00:00
Raymond Hettinger	c8df5780e1	Sf patch #700047 : unicode object leaks refcount on resizing Contributed by Hye-Shik Chang.	2003-03-09 07:30:43 +00:00
Neal Norwitz	ec74f2fda7	Add more missing PyErr_NoMemory() after failled memory allocs	2003-02-11 23:05:40 +00:00
Walter Dörwald	f6b56aecad	Fix two refcounting bugs	2003-02-09 23:42:56 +00:00
Walter Dörwald	2e0b18af30	Change the treatment of positions returned by PEP293 error handers in the Unicode codecs: Negative positions are treated as being relative to the end of the input and out of bounds positions result in an IndexError. Also update the PEP and include an explanation of this in the documentation for codecs.register_error. Fixes a small bug in iconv_codecs: if the position from the callback is negative add it to the size instead of substracting it. From SF patch #677429.	2003-01-31 17:19:08 +00:00
Guido van Rossum	5d9113d8be	Implement appropriate __getnewargs__ for all immutable subclassable builtin types. The special handling for these can now be removed from save_newobj(). Add some testing for this. Also add support for setting the 'fast' flag on the Python Pickler class, which suppresses use of the memo.	2003-01-29 17:58:45 +00:00
Walter Dörwald	adc727490b	Fix charmapencode_lookup(), so that a None value in the mapping is treated as "character maps to <undefined>" and not as "character mapping must return integer, None or str".	2003-01-08 22:01:33 +00:00
Walter Dörwald	034d97605d	Remove variable owned from PyUnicode_FromEncodedObject, which is unused (except for Py_DECREF calls) since the introduction of __unicode__.	2003-01-08 20:38:39 +00:00

1 2 3 4 5 ...

326 Commits