cpython

Commit Graph

Author	SHA1	Message	Date
Victor Stinner	fdfbf78114	Issue #25318 : Add _PyBytesWriter API Add a new private API to optimize Unicode encoders. It uses a small buffer allocated on the stack and supports overallocation. Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable overallocation for the UTF-8 encoder with error handlers. unicode_encode_ucs1(): initialize collend to collstart+1 to not check the current character twice, we already know that it is not ASCII.	2015-10-09 00:33:49 +02:00
Victor Stinner	ca9381ea01	Issue #24870 : Add _PyUnicodeWriter_PrepareKind() macro Add a macro which ensures that the writer has at least the requested kind.	2015-09-22 00:58:32 +02:00
Raymond Hettinger	ac2ef65c32	Make the unicode equality test an external function rather than in-lining it. The real benefit of the unicode specialized function comes from bypassing the overhead of PyObject_RichCompareBool() and not from being in-lined (especially since there was almost no shared data between the caller and callee). Also, the in-lining was having a negative effect on code generation for the callee.	2015-07-04 16:04:44 -07:00
Serhiy Storchaka	7e9d1d1a1b	Issue #23908 : os functions now reject paths with embedded null character on Windows instead of silently truncate them. Removed no longer used _PyUnicode_HasNULChars().	2015-04-20 10:12:28 +03:00
Victor Stinner	ce2c584ea5	Merge 3.4 (typo)	2015-02-11 18:18:10 +01:00
Victor Stinner	22fabe218d	Fix typo: PyMem_Alloc => PyMem_Malloc	2015-02-11 18:17:56 +01:00
Ethan Furman	b95b56150f	Issue20284: Implement PEP461	2015-01-23 20:05:18 -08:00
Benjamin Peterson	82f34ada45	fix instances of consecutive articles (closes #23221 ) Patch by Karan Goel.	2015-01-13 09:17:24 -05:00
Serhiy Storchaka	b757c83ec6	Issue #22581 : Use more "bytes-like object" throughout the docs and comments.	2014-12-05 22:25:22 +02:00
Antoine Pitrou	8c6f8dc527	Issue #19537 : Fix PyUnicode_DATA() alignment under m68k. Patch by Andreas Schwab.	2014-03-23 22:55:03 +01:00
Martin v. Löwis	1c0689c613	Issue #19526 : Exclude all new API from the stable ABI.	2014-01-03 21:36:49 +01:00
Victor Stinner	a726192181	oops, remove _PyObject_ReprWriter() definition (unwanted change)	2013-11-19 13:18:45 +01:00
Victor Stinner	4a58707a34	Add _PyUnicodeWriter_WriteASCIIString() function	2013-11-19 12:54:53 +01:00
Victor Stinner	ad14ccd047	Issue #19512 : add _PyUnicode_CompareWithId() function _PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString() when both strings are equal and interned. Add also _PyId_builtins identifier for "builtins" common string.	2013-11-07 00:46:04 +01:00
Antoine Pitrou	9ed5f27266	Issue #18722 : Remove uses of the "register" keyword in C code.	2013-08-13 20:18:52 +02:00
Victor Stinner	f476405503	fix typo in a comment	2013-04-18 23:21:19 +02:00
Victor Stinner	8f674ccd64	Close #17694 : Add minimum length to _PyUnicodeWriter * Add also min_char attribute to _PyUnicodeWriter structure (currently unused) * _PyUnicodeWriter_Init() has no more argument (except the writer itself): min_length and overallocate must be set explicitly * In error handlers, only enable overallocation if the replacement string is longer than 1 character * CJK decoders don't use overallocation anymore * Set min_length, instead of preallocating memory using _PyUnicodeWriter_Prepare(), in many decoders * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow	2013-04-17 23:02:17 +02:00
Victor Stinner	a0dd0213cc	Close #17693 : Rewrite CJK decoders to use the _PyUnicodeWriter API instead of the legacy Py_UNICODE API. Add also a new _PyUnicodeWriter_WriteChar() function.	2013-04-11 22:09:04 +02:00
Victor Stinner	cfc4c13b04	Add _PyUnicodeWriter_WriteSubstring() function Write a function to enable more optimizations: * If the substring is the whole string and overallocation is disabled, just keep a reference to the string, don't copy characters * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when possible	2013-04-03 01:48:39 +02:00
Victor Stinner	d45c7f8d74	Issue #16455 : On FreeBSD and Solaris, if the locale is C, the ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.	2012-12-04 01:34:47 +01:00
Victor Stinner	76df43de30	Issue #16330 : Use surrogate-related macros Patch written by Serhiy Storchaka.	2012-10-30 01:42:39 +01:00
Victor Stinner	e215d960be	Issue #16147 : Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API * Simplify the code: replace 4 steps with one unique step using the _PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to store intermediate results which require to allocate an array of pointers on the heap. * Use the _PyUnicodeWriter API for speed (and its convinient API): overallocate the buffer to reduce the number of "realloc()" * Implement "width" and "precision" in Python, don't rely on sprintf(). It avoids to need of a temporary buffer allocated on the heap: only use a small buffer allocated in the stack. * Add _PyUnicodeWriter_WriteCstr() function * Split PyUnicode_FromFormatV() into two functions: add unicode_fromformat_arg(). * Inline parse_format_flags(): the format of an argument is now only parsed once, it's no more needed to have a subfunction. * Optimize PyUnicode_FromFormatV() for characters between two "%" arguments: search the next "%" and copy the substring in one chunk, instead of copying character per character.	2012-10-06 23:03:36 +02:00
Ezio Melotti	080a2c087e	#16127 : merge with 3.3.	2012-10-05 03:34:02 +03:00
Ezio Melotti	e7f90375b1	#16127 : remove outdated references to narrow builds. Patch by Serhiy Storchaka.	2012-10-05 03:33:31 +03:00
Victor Stinner	90db9c47dc	Enable also ptr==ptr optimization in PyUnicode_Compare() It was already implemented in PyUnicode_RichCompare()	2012-10-04 21:53:50 +02:00
Antoine Pitrou	27f6a3b0bf	Issue #15026 : utf-16 encoding is now significantly faster (up to 10x). Patch by Serhiy Storchaka.	2012-06-15 22:15:23 +02:00
Victor Stinner	d7b7c7472b	Issue #14993 : Use standard "unsigned char" instead of a unsigned char bitfield	2012-06-04 22:52:12 +02:00
Victor Stinner	d3f0882dfb	Issue #14744 : Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args) * Formatting string, int, float and complex use the _PyUnicodeWriter API. It avoids a temporary buffer in most cases. * Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just keep a reference to the string if the output is only composed of one string * Disable overallocation when formatting the last argument of str%args and str.format(args) * Overallocation allocates at least 100 characters: add min_length attribute to the _PyUnicodeWriter structure * Add new private functions: _PyUnicode_FastCopyCharacters(), _PyUnicode_FastFill() and _PyUnicode_FromASCII() The speed up is around 20% in average.	2012-05-29 12:57:52 +02:00
Victor Stinner	ece58deb9f	Close #14648 : Compute correctly maxchar in str.format() for substrin	2012-04-23 23:36:38 +02:00
Victor Stinner	c9590ad745	Close #14085 : remove assertions from PyUnicode_WRITE macro Add checks in PyUnicode_WriteChar() and convert PyUnicode_New() assertion to a test raising a Python exception.	2012-03-04 01:34:37 +01:00
Victor Stinner	41a863cb81	Issue #13706 : Fix format(int, "n") for locale with non-ASCII thousands separator * Decode thousands separator and decimal point using PyUnicode_DecodeLocale() (from the locale encoding), instead of decoding them implicitly from latin1 * Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used * Change _PyUnicode_InsertThousandsGrouping() API to return the maximum character if unicode is NULL * Replace MIN/MAX macros by Py_MIN/Py_MAX * stringlib/undef.h undefines STRINGLIB_IS_UNICODE * stringlib/localeutil.h only supports Unicode	2012-02-24 00:37:51 +01:00
Victor Stinner	ed27785b32	Issue #13706 : Add assertions to detect bugs earlier	2012-02-01 00:22:23 +01:00
Antoine Pitrou	7ab4af0427	Issue #13848 : open() and the FileIO constructor now check for NUL characters in the file name. Patch by Hynek Schlawack.	2012-01-29 18:43:36 +01:00
Antoine Pitrou	1334884ff2	Issue #13848 : open() and the FileIO constructor now check for NUL characters in the file name. Patch by Hynek Schlawack.	2012-01-29 18:36:34 +01:00
Benjamin Peterson	ce79852077	use the static identifier api for looking up special methods I had to move the static identifier code from unicodeobject.h to object.h in order for this to work.	2012-01-22 11:24:29 -05:00
Benjamin Peterson	d5890c8db5	add str.casefold() (closes #13752 )	2012-01-14 13:23:30 -05:00
Amaury Forgeot d'Arc	77b1ecf0ad	Silence compilation warnings on Windows	2012-01-13 22:12:37 +01:00
Benjamin Peterson	b2bf01d824	use full unicode mappings for upper/lower/title case (#12736 ) Also broaden the category of characters that count as lowercase/uppercase.	2012-01-11 18:17:06 -05:00
Victor Stinner	3fe553160c	Add a new PyUnicode_Fill() function It is faster than the unicode_fill() function which was implemented in formatter_unicode.c.	2012-01-04 00:33:50 +01:00
Victor Stinner	80bc72d5a2	fix PyCompactUnicodeObject doc (test)	2011-12-22 03:23:10 +01:00
Victor Stinner	52e2cc8604	backout 7876cd49300d: Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum	2011-12-19 22:14:45 +01:00
Victor Stinner	0ba5af20c0	Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum	2011-12-17 22:18:27 +01:00
Victor Stinner	1b57967b96	Issue #13560 : Locale codec functions use the classic "errors" parameter, instead of surrogateescape So it would be possible to support more error handlers later.	2011-12-17 05:47:23 +01:00
Victor Stinner	f2ea71fcc8	Issue #13560 : Add PyUnicode_EncodeLocale() * Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not available * Document my last changes in Misc/NEWS	2011-12-17 04:13:41 +01:00
Victor Stinner	af02e1c85a	Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() * PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string from the current locale encoding * _Py_char2wchar() writes an "error code" in the size argument to indicate if the function failed because of memory allocation failure or because of a decoding error. The function doesn't write the error message directly to stderr. * Fix time.strftime() (if wcsftime() is missing): decode strftime() result from the current locale encoding, not from the filesystem encoding.	2011-12-16 23:56:01 +01:00
Victor Stinner	16e6a80923	PyUnicode_Resize(): warn about canonical representation Call also directly unicode_resize() in unicodeobject.c	2011-12-12 13:24:15 +01:00
Victor Stinner	b0a82a6a7f	Fix PyUnicode_Resize() for compact string: leave the string unchanged on error Fix also PyUnicode_Resize() doc	2011-12-12 13:08:33 +01:00
Victor Stinner	bf6e560d0c	Make PyUnicode_Copy() private => _PyUnicode_Copy() Undocument the function. Make also decode_utf8_errors() as private (static).	2011-12-12 01:53:47 +01:00
Victor Stinner	7a9105a380	resize_copy() now supports legacy ready strings	2011-12-12 00:13:42 +01:00
Victor Stinner	24c74be9a3	PyUnicode_IS_ASCII() macro ensures that the string is ready It has no sense to check if a not ready string is ASCII or not.	2011-12-12 01:24:20 +01:00
Victor Stinner	551ac95733	Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros And use surrogates macros everywhere in unicodeobject.c	2011-11-29 22:58:13 +01:00
Victor Stinner	f3ae6208c7	PyUnicode_GET_SIZE() checks that PyUnicode_AsUnicode() succeed using an assertion	2011-11-21 02:24:49 +01:00
Victor Stinner	77faf69ca1	_PyUnicode_CheckConsistency() also checks maxchar maximum value, not only its minimum value	2011-11-20 18:56:05 +01:00
Victor Stinner	9343999597	Fix PyUnicode_CopyCharacters() doc	2011-11-20 18:29:14 +01:00
Victor Stinner	7c8bbbbb0c	Ensure that Py_UCS4 is 32 bits and Py_UCS2 is 16 bits	2011-11-20 18:28:29 +01:00
Victor Stinner	6f9568bb1f	Fix misused of "PyUnicodeObject" structure name in unicodeobject.h	2011-11-17 00:12:44 +01:00
Martin v. Löwis	1db7c13be1	Port encoders from Py_UNICODE API to unicode object API.	2011-11-10 18:24:32 +01:00
Martin v. Löwis	d10759f6ed	Make _PyUnicode_FromId return borrowed references. http://mail.python.org/pipermail/python-dev/2011-November/114347.html	2011-11-07 13:00:05 +01:00
Victor Stinner	e30c0a1014	Fix gdb/libpython.py for not ready Unicode strings _PyUnicode_CheckConsistency() checks also hash and length value for not ready Unicode strings.	2011-11-04 20:54:05 +01:00
Victor Stinner	7931d9a951	Replace PyUnicodeObject type by PyObject * _PyUnicode_CheckConsistency() now takes a PyObject* instead of void* * Remove now useless casts to PyObject*	2011-11-04 00:22:48 +01:00
Martin v. Löwis	23e275b3ad	Port UCS1 and charmap codecs to new API.	2011-11-02 18:02:51 +01:00
Martin v. Löwis	0d3072e98d	Drop Py_UCS4_ functions. Closes #13246 .	2011-10-31 08:40:56 +01:00
Victor Stinner	9db1a8b69f	Replace PyUnicodeObject* by PyObject* where it was irrevelant A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to PyUnicodeObject* is wrong	2011-10-23 20:04:37 +02:00
Victor Stinner	55c7e00fc0	Simplify _PyUnicode_COMPACT_DATA() macro	2011-10-18 23:32:53 +02:00
Victor Stinner	3a50e7056e	Issue #12281 : Rewrite the MBCS codec to handle correctly replace and ignore error handlers on all Windows versions. The MBCS codec is now supporting all error handlers, instead of only replace to encode and ignore to decode.	2011-10-18 21:21:00 +02:00
Martin v. Löwis	bd928fef42	Rename _Py_identifier to _Py_IDENTIFIER.	2011-10-14 10:20:37 +02:00
Victor Stinner	8813104e53	Simplify PyUnicode_MAX_CHAR_VALUE Use PyUnicode_IS_ASCII instead of PyUnicode_IS_COMPACT_ASCII, so the following test can be removed: PyUnicode_DATA(op) == (((PyCompactUnicodeObject *)(op))->utf8)	2011-10-13 01:12:01 +02:00
Martin v. Löwis	87da872c69	Drop extra semicolon.	2011-10-09 11:54:42 +02:00
Martin v. Löwis	afe55bba33	Add API for static strings, primarily good for identifiers. Thanks to Konrad Schöbel and Jasper Schulz for helping with the mass-editing.	2011-10-09 10:38:36 +02:00
Martin v. Löwis	c47adb04b3	Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE.	2011-10-07 20:55:35 +02:00
Georg Brandl	db6c7f5c33	Update C API docs for PEP 393.	2011-10-07 11:19:11 +02:00
Victor Stinner	b066cc6aba	Fix PyUnicode_CHARACTER_SIZE and PyUnicode_KIND_SIZE	2011-10-06 15:54:53 +02:00
Antoine Pitrou	dbf697ae5c	Fix compilation warnings under 64-bit Windows	2011-10-06 15:34:41 +02:00
Éric Araujo	0f4ee93b06	Branch merge	2011-10-06 13:22:21 +02:00
Victor Stinner	1d4b35f4e5	rephrase PyUnicode_1BYTE_KIND documentation	2011-10-06 01:51:19 +02:00
Victor Stinner	fb9ea8c57e	Don't check for the maximum character when copying from unicodeobject.c * Create copy_characters() function which doesn't check for the maximum character in release mode * _PyUnicode_CheckConsistency() is no more static to be able to use it in _PyUnicode_FormatAdvanced() (in formatter_unicode.c) * _PyUnicode_CheckConsistency() checks the string hash	2011-10-06 01:45:57 +02:00
Éric Araujo	80a348c0a0	Fix typo	2011-10-05 01:11:12 +02:00
Victor Stinner	30134f53fc	Complete documentation of compact ASCII strings	2011-10-04 01:32:45 +02:00
Victor Stinner	a41463c203	Document utf8_length and wstr_length states Ensure these states with assertions in _PyUnicode_CheckConsistency().	2011-10-04 01:05:08 +02:00
Victor Stinner	7f11ad4594	Unicode: document when the wstr pointer is shared with data Add also related assertions to _PyUnicode_CheckConsistency().	2011-10-04 00:00:20 +02:00
Victor Stinner	8cfcbed4e3	Improve string forms and PyUnicode_Resize() documentation Remove also the FIXME for resize_copy(): as discussed with Martin, copy the string on resize if the string is not resizable is just fine.	2011-10-03 23:19:21 +02:00
Victor Stinner	c3cec7868b	Add asciilib: similar to ucs1, ucs2 and ucs4 library, but specialized to ASCII ucs1, ucs2 and ucs4 libraries have to scan created substring to find the maximum character, whereas it is not need to ASCII strings. Because ASCII strings are common, it is useful to optimize ASCII.	2011-10-05 21:24:08 +02:00
Victor Stinner	4d0d54bcba	Document requierements of Unicode kinds	2011-10-05 01:31:05 +02:00
Georg Brandl	07de325672	More fixes.	2011-10-05 16:47:38 +02:00
Georg Brandl	c6bc4c6897	Fix a few typos in the unicode header.	2011-10-05 16:23:09 +02:00
Georg Brandl	4975a9b44d	Fix grammar.	2011-10-05 16:12:21 +02:00
Victor Stinner	b9275c104e	Speedup str[a:b] and PyUnicode_FromKindAndData * str[a:b] doesn't scan the string for the maximum character if the string is ascii only * PyUnicode_FromKindAndData() stops if we are sure that we cannot use a shorter character type. For example, _PyUnicode_FromUCS1() stops if we have at least one character in range U+0080-U+00FF	2011-10-05 14:01:42 +02:00
Victor Stinner	85041a54bd	_PyUnicode_CheckConsistency() checks utf8 field consistency	2011-10-03 14:42:39 +02:00
Victor Stinner	a3b334da6d	PyUnicode_Ready() now sets ascii=1 if maxchar < 128 ascii=1 is no more reserved to PyASCIIObject. Use PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).	2011-10-03 13:53:37 +02:00
Victor Stinner	910337b42e	Add _PyUnicode_CheckConsistency() macro to help debugging * Document Unicode string states * Use _PyUnicode_CheckConsistency() to ensure that objects are always consistent.	2011-10-03 03:20:16 +02:00
Victor Stinner	37943769ef	PyUnicode_READ_CHAR() ensures that the string is ready	2011-10-02 20:33:18 +02:00
Victor Stinner	7a48ff7e06	Use Py_UCS1 instead of unsigned char in unicodeobject.h	2011-10-02 00:55:25 +02:00
Victor Stinner	cd9950fd09	PyUnicode_WriteChar() raises IndexError on invalid index PyUnicode_WriteChar() raises also a ValueError if the string has more than 1 reference.	2011-10-02 00:34:53 +02:00
Victor Stinner	9f789e7f63	_PyUnicode_AsKind() is not part of the stable ABI	2011-10-01 03:57:28 +02:00
Victor Stinner	4584a5ba1a	PyUnicode_CHARACTER_SIZE(): add a reference to PyUnicode_KIND_SIZE()	2011-10-01 02:39:37 +02:00
Victor Stinner	034f6cf10c	Add PyUnicode_Copy() function, include it to the public API	2011-09-30 02:26:44 +02:00
Victor Stinner	d8f6510acc	_PyUnicode_Ready() cannot be used on ready strings anymore * Change its prototype: PyObject* instead of PyUnicodeoObject. Remove an old assertion, the result of PyUnicode_READY (_PyUnicode_Ready) must be checked instead	2011-09-29 19:43:17 +02:00
Victor Stinner	bc8b81bc4e	Move _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() outside unicodeobject.h Move these macros to unicodeobject.c	2011-09-29 19:31:34 +02:00
Victor Stinner	a0702ab1fe	Add a note in PyUnicode_CopyCharacters() doc: it doesn't write null character Cleanup also the code (avoid the goto).	2011-09-29 14:14:38 +02:00
Victor Stinner	f5ca1a21a5	PyUnicode_CopyCharacters() fails if 'to' has more than 1 reference	2011-09-28 23:54:59 +02:00

1 2 3 4 5 ...

288 Commits