Commit Graph

1224 Commits

Author SHA1 Message Date
Benjamin Peterson 736b8012b4 prevent overflow in unicode_repr (closes #22520) 2014-09-29 23:02:15 -04:00
Benjamin Peterson 10e4b2545e merge 3.4 (closes #22518) 2014-09-29 18:53:58 -04:00
Benjamin Peterson 2b76ce6d27 merge 3.3 (closes #22518) 2014-09-29 18:50:06 -04:00
Benjamin Peterson a1c1be4e03 cleanup overflowing handling in unicode_decode_call_errorhandler and unicode_encode_ucs1 (closes #22518) 2014-09-29 18:18:57 -04:00
Serhiy Storchaka 20b39b27d9 Removed redundant casts to `char *`.
Corresponding functions now accept `const char *` (issue #1772673).
2014-09-28 11:27:24 +03:00
Benjamin Peterson fa5021699a Merge 3.3 2014-10-15 23:58:32 -04:00
Antoine Pitrou b6dc9b7554 Fixed signed/unsigned comparison warning 2014-10-15 23:14:53 +02:00
Serhiy Storchaka d8a1447c99 Issue #22215: Now ValueError is raised instead of TypeError when str or bytes
argument contains not permitted null character or byte.
2014-09-06 20:07:17 +03:00
Victor Stinner 12174a5dca Issue #22156: Fix "comparison between signed and unsigned integers" compiler
warnings in the Objects/ subdirectory.

PyType_FromSpecWithBases() and PyType_FromSpec() now reject explicitly negative
slot identifiers.
2014-08-15 23:17:38 +02:00
Victor Stinner f6a271ae98 Issue #18395: Rename ``_Py_char2wchar()`` to :c:func:`Py_DecodeLocale`, rename
``_Py_wchar2char()`` to :c:func:`Py_EncodeLocale`, and document these
functions.
2014-08-01 12:28:48 +02:00
Victor Stinner e1f17c6c0b unicodeobject.c: fix a compiler warning on Windows 64 bits 2014-07-25 14:03:03 +02:00
Victor Stinner c68b7fba86 (Merge 3.4) Issue #21892, #21893: Partial revert of changeset 4f55e802baf0,
PyErr_Format() uses "%zd" for Py_ssize_t, not PY_FORMAT_SIZE_T
2014-07-04 22:50:13 +02:00
Victor Stinner a33bce0945 Issue #21892, #21893: Partial revert of changeset 4f55e802baf0, PyErr_Format()
uses "%zd" for Py_ssize_t, not PY_FORMAT_SIZE_T
2014-07-04 22:47:46 +02:00
Victor Stinner 9f43505f3d (Merge 3.4) Closes #21892, #21893: Use PY_FORMAT_SIZE_T instead of %zi or %zu
to format C size_t, because %zi/%u is not supported on all platforms.
2014-07-01 08:57:54 +02:00
Victor Stinner 293f3f526d Closes #21892, #21893: Use PY_FORMAT_SIZE_T instead of %zi or %zu to format C
size_t, because %zi/%u is not supported on all platforms.
2014-07-01 08:57:10 +02:00
Serhiy Storchaka 48070c1248 Issue #23803: Fixed str.partition() and str.rpartition() when a separator
is wider then partitioned string.
2015-03-29 19:21:02 +03:00
Benjamin Peterson 92ce1b4392 merge 3.3 (#23362) 2015-03-02 13:23:41 -05:00
Benjamin Peterson e5a853c390 use PyMem_NEW to detect overflow (closes #23362) 2015-03-02 13:23:25 -05:00
Serhiy Storchaka 4dbc305002 Issue #23055: Fixed a buffer overflow in PyUnicode_FromFormatV. Analysis
and fix by Guido Vranken.
2015-01-27 22:18:46 +02:00
Victor Stinner 4dd25256e2 Issue #21118: PyLong_AS_LONG() result type is long
Even if PyLong_AS_LONG() cannot fail, I prefer to use the right type.
2014-04-08 09:14:21 +02:00
Benjamin Peterson 1365de764e fix reference leaks in the translate fast path (closes #21175)
Patch by Josh Rosenberg.
2014-04-07 20:15:41 -04:00
Victor Stinner 872b291b96 Issue #21118: Optimize also str.translate() for ASCII => ASCII deletion 2014-04-05 14:27:07 +02:00
Victor Stinner 4ff33af257 Issue #21118: Add unit test for invalid character replacement (code point higher than U+10ffff) 2014-04-05 11:56:37 +02:00
Victor Stinner 89a76abf20 Issue #21118: Optimize str.translate() for ASCII => ASCII translation 2014-04-05 11:44:04 +02:00
Victor Stinner 8a4422e78d Issue #21118: Remove unused variable 2014-04-05 00:15:52 +02:00
Victor Stinner 1194ea020c Issue #21118: Use _PyUnicodeWriter API in str.translate() to simplify and
factorize the code
2014-04-04 19:37:40 +02:00
Ethan Furman 9ab748013b Issue19995: more informative error message; spelling corrections; use operator.mod instead of __mod__ 2014-03-21 06:38:46 -07:00
Ethan Furman 38d872ee5d Issue19995: passing a non-int to %o, %c, %x, or %X now raises an exception 2014-03-19 08:38:52 -07:00
Victor Stinner 7d00cc1a64 Issue #20574: Implement incremental decoder for cp65001 code
(Windows code page 65001, Microsoft UTF-8).
2014-03-17 23:08:06 +01:00
Kristján Valur Jónsson 25dded041f Make the various iterators' "setstate" sliently and consistently clip the
index.  This avoids the possibility of setting an iterator to an invalid
state.
2014-03-05 13:47:57 +00:00
Kristján Valur Jónsson c5cc5011ac Make the various iterators' "setstate" sliently and consistently clip the
index.  This avoids the possibility of setting an iterator to an invalid
state.
2014-03-05 15:23:07 +00:00
Serhiy Storchaka 94ee389308 Issue #19619: Blacklist non-text codecs in method API
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.

The latter mechanism remains in place for third party non-text
encodings.

Backported changeset d68df99d7a57.
2014-02-24 14:43:03 +02:00
Benjamin Peterson 4267869ad8 merge 3.3 (#20507) 2014-02-15 13:03:20 -05:00
Benjamin Peterson 9743b2c2b5 give non-iterable TypeError a message (closes #20507) 2014-02-15 13:02:52 -05:00
Serhiy Storchaka dfe98a102e Issue #20437: Fixed 22 potential bugs when deleting objects references. 2014-02-09 13:46:20 +02:00
Serhiy Storchaka 505ff755d7 Issue #20437: Fixed 21 potential bugs when deleting objects references. 2014-02-09 13:33:53 +02:00
Larry Hastings 2623c8c23c Issue #20530: Argument Clinic's signature format has been revised again.
The new syntax is highly human readable while still preventing false
positives.  The syntax also extends Python syntax to denote "self" and
positional-only parameters, allowing inspect.Signature objects to be
totally accurate for all supported builtins in Python 3.4.
2014-02-08 22:15:29 -08:00
Serhiy Storchaka 6cbf151032 Issue #20538: UTF-7 incremental decoder produced inconsistant string when
input was truncated in BASE64 section.
2014-02-08 14:06:33 +02:00
Serhiy Storchaka 016a3f33a5 Issue #20538: UTF-7 incremental decoder produced inconsistant string when
input was truncated in BASE64 section.
2014-02-08 14:01:29 +02:00
Larry Hastings 581ee3618c Issue #20326: Argument Clinic now uses a simple, unique signature to
annotate text signatures in docstrings, resulting in fewer false
positives.  "self" parameters are also explicitly marked, allowing
inspect.Signature() to authoritatively detect (and skip) said parameters.

Issue #20326: Argument Clinic now generates separate checksums for the
input and output sections of the block, allowing external tools to verify
that the input has not changed (and thus the output is not out-of-date).
2014-01-28 05:00:08 -08:00
Larry Hastings c20472640c Issue #20390: Small fixes and improvements for Argument Clinic. 2014-01-25 20:43:29 -08:00
Larry Hastings 5c66189e88 Issue #20189: Four additional builtin types (PyTypeObject,
PyMethodDescr_Type, _PyMethodWrapper_Type, and PyWrapperDescr_Type)
have been modified to provide introspection information for builtins.
Also: many additional Lib, test suite, and Argument Clinic fixes.
2014-01-24 06:17:25 -08:00
Ethan Furman a70805e1fa Issue19995: fixed typo; switched from test.support.check_warnings to assertWarns 2014-01-12 08:42:35 -08:00
Ethan Furman f9bba9c67f Issue19995: issue deprecation warning for non-integer values to %c, %o, %x, %X 2014-01-11 23:20:58 -08:00
Larry Hastings 61272b77b0 Issue #19273: The marker comments Argument Clinic uses have been changed
to improve readability.
2014-01-07 12:41:53 -08:00
Ethan Furman df3ed242c0 Issue19995: %o, %x, %X now only accept ints 2014-01-05 06:50:30 -08:00
Serhiy Storchaka 3079328d29 Reverted changeset b72c5573c5e7 (issue #15027). 2014-01-04 22:44:01 +02:00
Serhiy Storchaka 583a93943c Issue #15027: Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster. 2014-01-04 19:25:37 +02:00
Victor Stinner fa4e68d425 Remove deadcode (HASH macro is no more defined) 2014-01-03 17:42:18 +01:00
Victor Stinner 92a419eea4 Remove now unused variables 2014-01-03 17:39:40 +01:00
Victor Stinner f3b46b4a66 unicode_char() uses get_latin1_char() to get latin1 singleton characters 2014-01-03 13:16:00 +01:00
Victor Stinner 985a82a6d2 add unicode_char() in unicodeobject.c to factorize code 2014-01-03 12:53:47 +01:00
Larry Hastings 44e2eaab54 Issue #19674: inspect.signature() now produces a correct signature
for some builtins.
2013-11-23 15:37:55 -08:00
Larry Hastings ebdcb50b8a Issue #19730: Argument Clinic now supports all the existing PyArg
"format units" as legacy converters, as well as two new features:
"self converters" and the "version" directive.
2013-11-23 14:54:00 -08:00
Nick Coghlan c72e4e6dcc Issue #19619: Blacklist non-text codecs in method API
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.

The latter mechanism remains in place for third party non-text
encodings.
2013-11-22 22:39:36 +10:00
Christian Heimes 985ecdcfc2 ssue #19183: Implement PEP 456 'secure and interchangeable hash algorithm'.
Python now uses SipHash24 on all major platforms.
2013-11-20 11:46:18 +01:00
Victor Stinner 4a58707a34 Add _PyUnicodeWriter_WriteASCIIString() function 2013-11-19 12:54:53 +01:00
Serhiy Storchaka 58cf607d13 Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.

Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Victor Stinner 6989ba0174 Issue #19581: Change the overallocation factor of _PyUnicodeWriter on Windows
On Windows, a factor of 50% gives best performances.
2013-11-18 21:08:39 +01:00
Larry Hastings ed4a1c5703 Argument Clinic: rename "self" to "module" for module-level functions. 2013-11-18 09:32:13 -08:00
Ezio Melotti 745d54d2fa #17806: Added keyword-argument support for "tabsize" to str/bytes.expandtabs(). 2013-11-16 19:10:57 +02:00
Nick Coghlan 8b097b4ed7 Close #17828: better handling of codec errors
- output type errors now redirect users to the type-neutral
  convenience functions in the codecs module
- stateless errors that occur during encoding and decoding
  will now be automatically wrapped in exceptions that give
  the name of the codec involved
2013-11-13 23:49:21 +10:00
Victor Stinner 66b3270975 _Py_normalize_encoding(): explain how the value 6 was computed 2013-11-07 23:12:23 +01:00
Victor Stinner df23e30bea Fix _Py_normalize_encoding(): ensure that buffer is big enough to store "utf-8"
if the input string is NULL
2013-11-07 13:33:36 +01:00
Victor Stinner ad14ccd047 Issue #19512: add _PyUnicode_CompareWithId() function
_PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString()
when both strings are equal and interned.

Add also _PyId_builtins identifier for "builtins" common string.
2013-11-07 00:46:04 +01:00
Victor Stinner 21ea21ef6d Issue #19424: PyUnicode_CompareWithASCIIString() normalizes memcmp() result
to -1, 0, 1
2013-11-04 11:28:26 +01:00
Victor Stinner f0c7b2af05 Issue #16286: remove duplicated identity check from unicode_compare()
Move the test to PyUnicode_Compare()
2013-11-04 11:27:14 +01:00
Victor Stinner fd9e44db37 Issue #16286: optimize PyUnicode_RichCompare() for identical strings (same
pointer) for any operator, not only Py_EQ and Py_NE.

Code of bytes_richcompare() and PyUnicode_RichCompare() is now closer.
2013-11-04 11:23:05 +01:00
Victor Stinner c8bc5377ac Issue #16286: write a new subfunction bytes_compare_eq()
* cleanup bytes_richcompare()
* PyUnicode_RichCompare(): replace a test with a XOR
2013-11-04 11:08:10 +01:00
Victor Stinner e1b1592fd4 Issue #19424: Fix a compiler warning on comparing signed/unsigned size_t
Patch written by Zachary Ware.
2013-11-03 13:53:12 +01:00
Victor Stinner a6b9b071a3 Issue #19424: Fix a compiler warning
memcmp() just takes raw pointers
2013-10-30 18:27:13 +01:00
Victor Stinner 602f7cf0b9 Issue #19424: Optimize PyUnicode_CompareWithASCIIString()
Use fast memcmp() instead of a loop using the slow PyUnicode_READ() macro.
strlen() is still necessary to check Unicode string containing null bytes.
2013-10-29 23:31:50 +01:00
Victor Stinner 68b674c9d4 Issue #19437: Fix _PyUnicode_New() (constructor of legacy string), set all
attributes before checking for error. The destructor expects all attributes to
be set. It is now safe to call Py_DECREF(unicode) in the constructor.
2013-10-29 19:31:43 +01:00
Victor Stinner fa3ba4c3bc Issue #18609: Add a fast-path for "iso8859-1" encoding
On AIX, the locale encoding may be "iso8859-1", which was not a known syntax of
the legacy ISO 8859-1 encoding.

Using a C codec instead of a Python codec is faster but also avoids tricky
issues during Python startup or complex code.
2013-10-29 11:34:05 +01:00
Victor Stinner a5afb58986 Issue #18408: Fix PyUnicode_AsUTF8AndSize(), raise MemoryError exception on
memory allocation failure
2013-10-29 01:28:23 +01:00
Serhiy Storchaka c679227e31 Issue #1772673: The type of `char*` arguments now changed to `const char*`. 2013-10-19 21:03:34 +03:00
Serhiy Storchaka 55e092f545 Issue #19279: UTF-7 decoder no more produces illegal strings. 2013-10-19 20:39:28 +03:00
Serhiy Storchaka 35804e4c63 Issue #19279: UTF-7 decoder no more produces illegal strings. 2013-10-19 20:38:19 +03:00
Larry Hastings 3182680210 Issue #16612: Add "Argument Clinic", a compile-time preprocessor
for C files to generate argument parsing code.  (See PEP 436.)
2013-10-19 00:09:25 -07:00
Ethan Furman fb13721b1b Close #18780: %-formatting now prints value for int subclasses with %d, %i, and %u codes. 2013-08-31 10:18:55 -07:00
Antoine Pitrou 9ed5f27266 Issue #18722: Remove uses of the "register" keyword in C code. 2013-08-13 20:18:52 +02:00
Raymond Hettinger e56666d17f Silence compiler warning about an uninitialized variable 2013-08-04 11:51:03 -07:00
Raymond Hettinger 5ed1b38a7d merge 2013-08-04 11:51:35 -07:00
Christian Heimes b578735dff Check return value of PyType_Ready(&EncodingMapType)
CID 486654
2013-07-20 14:57:28 +02:00
Christian Heimes 26532f7519 Check return value of PyType_Ready(&EncodingMapType)
CID 486654
2013-07-20 14:57:16 +02:00
Victor Stinner e699e5a218 Issue #18408: Don't check unicode consistency in _PyUnicode_HAS_UTF8_MEMORY()
and _PyUnicode_HAS_WSTR_MEMORY() macros

These macros are called in unicode_dealloc(), whereas the unicode object can be
"inconsistent" if the creation of the object failed.

For example, when unicode_subtype_new() fails on a memory allocation,
_PyUnicode_CheckConsistency() fails with an assertion error because data is
NULL.
2013-07-15 18:22:47 +02:00
Victor Stinner 9e6b4d715c Issue #18408: _PyUnicodeWriter_Finish() now clears its buffer attribute in all
cases, so _PyUnicodeWriter_Dealloc() can be called after finish.
2013-07-09 00:37:24 +02:00
Victor Stinner 15a0bd3965 Issue #18408: Fix _PyUnicodeWriter_Finish(): clear writer->buffer,
so _PyUnicodeWriter_Dealloc() can be called on the writer after finish.
2013-07-08 22:29:55 +02:00
Victor Stinner 6f8eeee7b9 Issue #18203: Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar() 2013-07-07 22:57:45 +02:00
Victor Stinner 1a7425f67a Issue #18203: Replace malloc() with PyMem_RawMalloc() at Python initialization
* Replace malloc() with PyMem_RawMalloc()
* Replace PyMem_Malloc() with PyMem_RawMalloc() where the GIL is not held.
* _Py_char2wchar() now returns a buffer allocated by PyMem_RawMalloc(), instead
  of PyMem_Malloc()
2013-07-07 16:25:15 +02:00
Christian Heimes d47802eef7 Fix ref leak in error case of unicode find, count, formatlong
CID 983315: Resource leak (RESOURCE_LEAK)
CID 983316: Resource leak (RESOURCE_LEAK)
CID 983317: Resource leak (RESOURCE_LEAK)
2013-06-29 21:33:36 +02:00
Christian Heimes d47a0456b1 Fix ref leak in error case of unicode index
CID 983319 (#1 of 2): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:21:37 +02:00
Christian Heimes ea71a525c3 Fix ref leak in error case of unicode rindex and rfind
CID 983320: Resource leak (RESOURCE_LEAK)
CID 983321: Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 21:17:34 +02:00
Christian Heimes 305e49e17e Fix memory leak in endswith
CID 1040368 (#1 of 1): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable substring going out of scope leaks the storage it points to.
2013-06-29 20:41:06 +02:00
Serhiy Storchaka c89533f72f Issue #18184: PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
OverflowError when an argument of %c format is out of range.
2013-06-23 20:21:16 +03:00
Serhiy Storchaka 8eeae2126c Issue #18184: PyUnicode_FromFormat() and PyUnicode_FromFormatV() now raise
OverflowError when an argument of %c format is out of range.
2013-06-23 20:12:14 +03:00
Benjamin Peterson 3164f5d565 merge 3.3 (#18183) 2013-06-10 09:24:01 -07:00
Benjamin Peterson 7e30373126 remove MAX_MAXCHAR because it's unsafe for computing maximum codepoitn value (see #18183) 2013-06-10 09:19:46 -07:00
Victor Stinner 9f067f490f Issue #9566: Fix compiler warning on Windows 64-bit 2013-06-05 00:21:31 +02:00
Antoine Pitrou 7ce35a1816 Issue #17237: Fix crash in the ASCII decoder on m68k. 2013-05-11 15:59:37 +02:00