cpython

Commit Graph

Author	SHA1	Message	Date
Fredrik Lundh	0fdb90cafe	refactored the unicodeobject/ucnhash interface, to hide the implementation details inside the ucnhash module. also cleaned up the unicode copyright blurb a little; Secret Labs' internal revision history isn't that interesting...	2001-01-19 09:45:02 +00:00
Marc-André Lemburg	ad7c98e264	This patch adds a new builtin unistr() which behaves like str() except that it always returns Unicode objects. A new C API PyObject_Unicode() is also provided. This closes patch #101664. Written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.	2001-01-17 17:09:53 +00:00
Marc-André Lemburg	3a645e4dd4	Added checks to prevent PyUnicode_Count() from dumping core in case the parameters are out of bounds and fixes error handling for .count(), .startswith() and .endswith() for the case of mixed string/Unicode objects. This patch adds Python style index semantics to PyUnicode_Count() indices (including the special handling of negative indices). The patch is an extended version of patch #103249 submitted by Michael Hudson (mwh) on SF. It also includes new test cases.	2001-01-16 11:54:12 +00:00
Marc-André Lemburg	ec233e5803	This patch adds a new feature to the builtin charmap codec: The mapping dictionaries can now contain 1-n mappings, meaning that character ordinals may be mapped to strings or Unicode object, e.g. 0x0078 ('x') -> u"abc", causing the ordinal to be replaced by the complete string or Unicode object instead of just one character. Another feature introduced by the patch is that of mapping oridnals to the emtpy string. This allows removing characters. The patch is different from patch #103100 in that it does not cause a performance hit for the normal use case of 1-1 mappings. Written by Marc-Andre Lemburg, copyright assigned to Guido van Rossum.	2001-01-06 14:59:58 +00:00
Marc-André Lemburg	a866df806d	This patch changes the default behaviour of the builtin charmap codec to not apply Latin-1 mappings for keys which are not found in the mapping dictionaries, but instead treat them as undefined mappings. The patch was originally written by Martin v. Loewis with some additional (cosmetic) changes and an updated test script by Marc-Andre Lemburg. The standard codecs were recreated from the most current files available at the Unicode.org site using the Tools/scripts/gencodec.py tool. This patch closes the bugs #116285 and #119960.	2001-01-03 21:29:14 +00:00
Andrew M. Kuchling	f947ffe951	Patch #102940 : use only printable Unicode chars in reporting incorrect % characters; characters outside the printable range are replaced with '?'	2000-12-19 22:49:06 +00:00
Guido van Rossum	cda4f9a8dc	Fix off-by-one error in split_substring(). Fixes SF bug #122162 .	2000-12-19 02:23:19 +00:00
Andrew M. Kuchling	6ca8917758	[ Patch #102852 ] Make % error a bit more informative by indicates the index at which an unknown %-escape was found	2000-12-15 13:07:46 +00:00
Tim Peters	a3a3a030af	Fox for SF bug #123859 : %[duxXo] long formats inconsistent.	2000-11-30 05:22:44 +00:00
Barry Warsaw	5b4c22806f	_PyUnicode_Fini(): Initialize the local freelist walking variable `u' after unicode_empty has been freed, otherwise it might not point to the real start of the unicode_freelist. Final closure for SF bug #110681, Jitterbug PR#398.	2000-10-03 20:45:26 +00:00
Guido van Rossum	4ae8ef84da	In _PyUnicode_Fini(), decref unicode_empty before tearng down the free list. Discovered by Barry, fix approved by MAL.	2000-10-03 18:09:04 +00:00
Fred Drake	d5fadf75e4	Rationalize use of limits.h, moving the inclusion to Python.h. Add definitions of INT_MAX and LONG_MAX to pyport.h. Remove includes of limits.h and conditional definitions of INT_MAX and LONG_MAX elsewhere. This closes SourceForge patch #101659 and bug #115323.	2000-09-26 05:46:01 +00:00
Tim Peters	38fd5b6413	Derived from Martin's SF patch 110609: support unbounded ints in %d,i,u,x,X,o formats. Note a curious extension to the std C rules: x, X and o formatting can never produce a sign character in C, so the '+' and ' ' flags are meaningless for them. But unbounded ints can produce a sign character under these conversions (no fixed- width bitstring is wide enough to hold all negative values in 2's-comp form). So these flags become meaningful in Python when formatting a Python long which is too big to fit in a C long. This required shuffling around existing code, which hacked x and X conversions to death when both the '#' and '0' flags were specified: the hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or '+' flags too, since signs were always meaningless before for x and X conversions. Isomorphic shuffling was required in unicodeobject.c. Also added dozens of non-trivial new unbounded-int test cases to test_format.py.	2000-09-21 05:43:11 +00:00
Tim Peters	8f422461b4	Fix for bug 113934. stringn and unicoden did no overflow checking at all, either to see whether the # of chars fit in an int, or that the amount of memory needed fit in a size_t. Checking these is expensive, but the alternative is silently wrong answers (as in the bug report) or core dumps (which were easy to provoke using Unicode strings).	2000-09-09 06:13:41 +00:00
Fredrik Lundh	df84675f93	changed \x to consume exactly two hex digits, also for unicode strings. closes PEP-223. also added \U escape (eight hex digits).	2000-09-03 11:29:49 +00:00
Barry Warsaw	ce4dc41b1a	PyUnicode_AsUTF8String(): /F picks up what I missed: the local var `str' is no longer necessary. Gotta turn on -Wall!	2000-08-18 19:30:40 +00:00
Barry Warsaw	2dd4abf277	PyUnicode_AsUTF8String(): Don't need to explicitly incref str since PyUnicode_EncodeUTF8() already returns the created object with the proper reference count. This fixes an Insure reported memory leak.	2000-08-18 06:58:15 +00:00
Marc-André Lemburg	b7520774e2	Fixed a couple of instances where a 0-length string was being resized after creation. 0-length strings are usually shared and _PyString_Resize() fails on these shared strings. Fixes [ Bug #111667 ] unicode core dump.	2000-08-14 11:29:19 +00:00
Trent Mick	20abf573ef	Clean up warning from Monterey compiler. Properly end a comment block. It was terminated fine later but by a subsequent block and. It was also in #if 0. This patch is so trivial I can't believe I am talking about it. :)	2000-08-12 22:14:34 +00:00
Marc-André Lemburg	e5034378cc	Removing UTF-16 aware Unicode comparison code. This kind of compare function (together with other locale aware ones) should into a new collation support module. See python-dev for a discussion of this removal. Note: This patch should also be applied to the 1.6 branch.	2000-08-08 08:04:29 +00:00
Marc-André Lemburg	bff879cabb	This patch finalizes the move from UTF-8 to a default encoding in the Python Unicode implementation. The internal buffer used for implementing the buffer protocol is renamed to defenc to make this change visible. It now holds the default encoded version of the Unicode object and is calculated on demand (NULL otherwise). Since the default encoding defaults to ASCII, this will mean that Unicode objects which hold non-ASCII characters will no longer work on C APIs using the "s" or "t" parser markers. C APIs must now explicitly provide Unicode support via the "u", "U" or "es"/"es#" parser markers in order to work with non-ASCII Unicode strings. (Note: this patch will also have to be applied to the 1.6 branch of the CVS tree.)	2000-08-03 18:46:08 +00:00
Guido van Rossum	16b1ad9c7d	Changing the CNRI copyright notice according to CNRI's instructions. This is a notice without a date, which apparently is not a claim to copyright but only advice to the reader. IANAL. :-)	2000-08-03 16:24:25 +00:00
Peter Schneider-Kamp	7e01890986	merge Include/my.h into Include/pyport.h marked my.h as obsolete	2000-07-31 15:28:04 +00:00
Thomas Wouters	7889010731	Miscelaneous ANSIfications. I'm assuming here 'main' should take (int, char**) and return an int even on PC platforms. If not, please fix PC/utils/makesrc.c ;-P	2000-07-22 19:25:51 +00:00
Marc-André Lemburg	9542f48fd5	Fixed problems with UTF error reporting macros and some formatting bugs.	2000-07-17 18:23:13 +00:00
Greg Stein	af36a3aa20	gcc is being stupid with if/else constructs clean out some other warnings	2000-07-17 09:04:43 +00:00
Greg Stein	ff975003cf	stop messing around with goto and just write the macro correctly.	2000-07-16 21:39:49 +00:00
Fredrik Lundh	0e19e76aba	- change \x to mean "byte" also in unicode literals (patch #100912)	2000-07-16 18:47:43 +00:00
Tim Peters	855ffac224	Fix fatal compiler (MSVC6) error: unicodeobject.c(735) : error C2143: syntax error : missing ';' before '}'	2000-07-16 17:10:50 +00:00
Marc-André Lemburg	fb625847bf	Fix to a bug found by Florian Weimer: The UTF-8 decoder is still buggy (i.e. it doesn't pass Markus Kuhn's stress test), mainly due to the following construct: #define UTF8_ERROR(details) do { \ if (utf8_decoding_error(&s, &p, errors, details)) \ goto onError; \ continue; \ } while (0) (The "continue" statement is supposed to exit from the outer loop, but of course, it doesn't. Indeed, this is a marvelous example of the dangers of the C programming language and especially of the C preprocessor.)	2000-07-16 13:29:13 +00:00
Thomas Wouters	7e47402264	Spelling fixes supplied by Rob W. W. Hooft. All these are fixes in either comments, docstrings or error messages. I fixed two minor things in test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't"). There is a minor style issue involved: Guido seems to have preferred English grammar (behaviour, honour) in a couple places. This patch changes that to American, which is the more prominent style in the source. I prefer English myself, so if English is preferred, I'd be happy to supply a patch myself ;)	2000-07-16 12:04:32 +00:00
Jeremy Hylton	03657cfdb0	replace PyXXX_Length calls with PyXXX_Size calls	2000-07-12 13:05:33 +00:00
Marc-André Lemburg	566d8a64eb	Jeremy Hylton: better error message for unicode coercion failure	2000-07-11 09:47:04 +00:00
Fredrik Lundh	dde6164402	- changed hash calculation for unicode strings. the new value is calculated from the character values, in a way that makes sure an 8-bit ASCII string and a unicode string with the same contents get the same hash value. (as a side effect, this also works for ISO Latin 1 strings). for more details, see the python-dev discussion.	2000-07-10 18:27:47 +00:00
Marc-André Lemburg	e12896ec98	New surrogate support in the UTF-8 codec. By Bill Tutt.	2000-07-07 17:51:08 +00:00
Marc-André Lemburg	5a5c81a0e9	Added new API PyUnicode_FromEncodedObject() which supports decoding objects including instance objects. The old API PyUnicode_FromObject() is still available as shortcut.	2000-07-07 13:46:42 +00:00
Marc-André Lemburg	063e0cb4c6	Fix to bug #393 (UTF16 codec didn't like empty strings) and corrected some usage of 'unsigned long' where Py_UNICODE should have been used.	2000-07-07 11:27:45 +00:00
Sjoerd Mullender	2629bd5a33	Two more places where long should be used instead of int. Especially true after revision 2.36 was checked in...	2000-07-07 09:47:24 +00:00
Marc-André Lemburg	449c325303	Fixed some code that used 'short' to use 'long' instead.	2000-07-06 20:13:23 +00:00
Marc-André Lemburg	85cc4d8940	Fixed a couple of places where 'int' was used where 'long' should have been used.	2000-07-06 19:43:31 +00:00
Marc-André Lemburg	a7acf425f6	Added new .isalpha() and .isalnum() methods which provide interfaces to the new alphabetic lookup APIs in unicodectype.c.	2000-07-05 09:49:44 +00:00
Marc-André Lemburg	1e7205a62a	Bill Tutt: Make unicode_compare a true UTF-16 compare function (includes support for surrogates).	2000-07-04 09:51:07 +00:00
Marc-André Lemburg	d49e5b4667	Marc-Andre Lemburg <mal@lemburg.com>: A previous patch by Jack Jansen was accidently reverted.	2000-06-30 14:58:20 +00:00
Marc-André Lemburg	f28dd83b86	Marc-Andre Lemburg <mal@lemburg.com>: New buffer overflow checks for formatting strings. By Trent Mick.	2000-06-30 10:29:57 +00:00
Guido van Rossum	4f4b799b33	Jack Jansen: Use include "" instead of <>; and staticforward declarations	2000-06-29 00:06:39 +00:00
Marc-André Lemburg	0f774e3987	Marc-Andre Lemburg <mal@lemburg.com>: Patch to the standard unicode-escape codec which dynamically loads the Unicode name to ordinal mapping from the module ucnhash. By Bill Tutt.	2000-06-28 16:43:35 +00:00
Marc-André Lemburg	7c014684c2	Marc-Andre Lemburg <mal@lemburg.com>: Better error message for "1 in unicodestring". Submitted by Andrew Kuchling.	2000-06-28 08:11:47 +00:00
Marc-André Lemburg	49ef6dc1f4	Marc-Andre Lemburg <mal@lemburg.com>: Fixed a bug in PyUnicode_Count() which would have caused a core dump in case of substring coercion failure. Synchronized .count() with the string method of the same name to return len(s)+1 for s.count('').	2000-06-18 22:25:22 +00:00
Marc-André Lemburg	bea47e768d	Vladimir MARANGOZOV <Vladimir.Marangozov@inrialpes.fr>: This patch fixes an optimisation mystery in _PyUnicodeNew causing segfaults on AIX when the interpreter is compiled with -O.	2000-06-17 20:31:17 +00:00
Marc-André Lemburg	60bc809d9a	Marc-Andre Lemburg <mal@lemburg.com>: Added code so that .isXXX() testing returns 0 for emtpy strings.	2000-06-14 09:18:32 +00:00
Marc-André Lemburg	07ceb67d9c	Marc-Andre Lemburg <mal@lemburg.com>: Fixed a typo and removed a debug printf(). Thanks to Finn Bock for finding these.	2000-06-10 09:32:51 +00:00
Andrew M. Kuchling	cb95a1470a	Patch from Michael Hudson: improve unclear error message	2000-06-09 14:04:53 +00:00
Marc-André Lemburg	d4ab4a5905	Marc-Andre Lemburg <mal@lemburg.com>: Fixed %c formatting to check for one character arguments. Thanks to Finn Bock for finding this bug. Added a fix for bug PR#348 which originated from not resetting the globals correctly in _PyUnicode_Fini().	2000-06-08 17:54:00 +00:00
Marc-André Lemburg	90e8147118	Marc-Andre Lemburg <mal@lemburg.com>: Change the default encoding to 'ascii' (it was previously defined as UTF-8). Note: The implementation still uses UTF-8 to implement the buffer protocol, so C APIs will still see UTF-8. This is on purpose: rather than fixing the Unicode implementation, the C APIs should be made Unicode aware.	2000-06-07 09:13:21 +00:00
Fred Drake	785d14f965	Minimal change so I can add the rest of MAL's checkin message: M.-A. Lemburg <mal@lemburg.com>: Fixed a core dump in PyUnicode_Format().	2000-05-09 19:54:43 +00:00
Fred Drake	e4315f58d2	M.-A. Lemburg <mal@lemburg.com>: Added support for user settable default encodings. The current implementation uses a per-process global which defines the value of the encoding parameter in case it is set to NULL (meaning: use the default encoding).	2000-05-09 19:53:39 +00:00
Guido van Rossum	b8872e61c6	Trent Mick: Fix the string methods that implement slice-like semantics with optional args (count, find, endswith, etc.) to properly handle indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter for PyArg_ParseTuple was used to get the indices. These could overflow. This patch changes the string methods to use the "O&" formatter with the slice_index() function from ceval.c which is used to do the same job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]).	2000-05-09 14:14:27 +00:00
Guido van Rossum	03e29f1ae9	Mark Hammond should get his act into gear (his words :-). Zero length strings _are_ valid!	2000-05-04 15:52:20 +00:00
Guido van Rossum	42c29aaeb5	Fix warning detected by VC++ on assignment of Py_UNICODE to char.	2000-05-03 23:58:29 +00:00
Guido van Rossum	b18618dab7	Vladimir Marangozov's long-awaited malloc restructuring. For more comments, read the patches@python.org archives. For documentation read the comments in mymalloc.h and objimpl.h. (This is not exactly what Vladimir posted to the patches list; I've made a few changes, and Vladimir sent me a fix in private email for a problem that only occurs in debug mode. I'm also holding back on his change to main.c, which seems unnecessary to me.)	2000-05-03 23:44:39 +00:00
Guido van Rossum	4e751c3d12	Mark Hammond withdraws his fix -- the size includes the trailing 0 so a size of 0 is illegal.	2000-05-03 12:27:22 +00:00
Guido van Rossum	a6edfd9737	Mark Hammond: Fixes the MBCS codec to work correctly with zero length strings.	2000-05-03 11:03:24 +00:00
Guido van Rossum	0e4f657a50	Marc-Andre Lemburg: Fixed \OOO interpretation for Unicode objects. \777 now correctly produces the Unicode character with ordinal 511.	2000-05-01 21:27:20 +00:00
Guido van Rossum	3c1bb8043f	Marc-Andre Lemburg: Fixed a reference leak in the allocator. Renamed utf8_string to _PyUnicode_AsUTF8String() and made it external for use by other parts of the interpreter.	2000-04-27 20:13:50 +00:00
Guido van Rossum	86662914be	Marc-Andre Lemburg: The maxsplit functionality in .splitlines() was replaced by the keepends functionality which allows keeping the line end markers together with the string.	2000-04-11 15:38:46 +00:00
Guido van Rossum	fd4b957b06	Marc-Andre Lemburg: * New exported API PyUnicode_Resize() * The experimental Keep-Alive optimization was turned back on after some tweaks to the implementation. It should now work without causing core dumps... this has yet to tested though (switching it off is easy: see the unicodeobject.c file for details). * Fixed a memory leak in the Unicode freelist cleanup code. * Added tests to correctly process the return code from _PyUnicode_Resize(). * Fixed a bug in the 'ignore' error handling routines of some builtin codecs. Added test cases for these to test_unicode.py.	2000-04-10 13:51:10 +00:00
Guido van Rossum	5db862dd0c	Skip Montanaro: add string precisions to calls to PyErr_Format to prevent possible buffer overruns.	2000-04-10 12:46:51 +00:00
Guido van Rossum	ba47704943	Conrad Huang points out that "if (0 < ch < 256)", while legal C, doesn't mean what the Python programmer thought...	2000-04-06 18:18:10 +00:00
Guido van Rossum	34888ed689	Fredrik Lundh: eliminate a MSVC compiler warning.	2000-04-05 21:29:50 +00:00
Guido van Rossum	9e896b37c7	Marc-Andre's third try at this bulk patch seems to work (except that his copy of test_contains.py seems to be broken -- the lines he deleted were already absent). Checkin messages: New Unicode support for int(), float(), complex() and long(). - new APIs PyInt_FromUnicode() and PyLong_FromUnicode() - added support for Unicode to PyFloat_FromString() - new encoding API PyUnicode_EncodeDecimal() which converts Unicode to a decimal char* string (used in the above new APIs) - shortcuts for calls like int(<int object>) and float(<float obj>) - tests for all of the above Unicode compares and contains checks: - comparing Unicode and non-string types now works; TypeErrors are masked, all other errors such as ValueError during Unicode coercion are passed through (note that PyUnicode_Compare does not implement the masking -- PyObject_Compare does this) - contains now works for non-string types too; TypeErrors are masked and 0 returned; all other errors are passed through Better testing support for the standard codecs. Misc minor enhancements, such as an alias dbcs for the mbcs codec. Changes: - PyLong_FromString() now applies the same error checks as does PyInt_FromString(): trailing garbage is reported as error and not longer silently ignored. The only characters which may be trailing the digits are 'L' and 'l' -- these are still silently ignored. - string.ato?() now directly interface to int(), long() and float(). The error strings are now a little different, but the type still remains the same. These functions are now ready to get declared obsolete ;-) - PyNumber_Int() now also does a check for embedded NULL chars in the input string; PyNumber_Long() already did this (and still does) Followed by: Looks like I've gone a step too far there... (and test_contains.py seem to have a bug too). I've changed back to reporting all errors in PyUnicode_Contains() and added a few more test cases to test_contains.py (plus corrected the join() NameError).	2000-04-05 20:11:21 +00:00
Guido van Rossum	2ea3e143f0	Some blank lines.	2000-03-31 17:24:09 +00:00
Guido van Rossum	b7a40ba8d3	MBCS codecs. (Win32 only.) By Mark Hammond.	2000-03-28 02:01:52 +00:00
Barry Warsaw	51ac58039f	On 17-Mar-2000, Marc-Andre Lemburg said: Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one).	2000-03-20 16:36:48 +00:00
Guido van Rossum	403d68b484	Add sq_contains implementation.	2000-03-13 15:55:09 +00:00
Guido van Rossum	d57fd91488	Unicode implementation by Marc-Andre Lemburg based on original code by Fredrik Lundh.	2000-03-10 22:53:23 +00:00

1 2 3 4

175 Commits