Issue #25401: Optimize bytes.fromhex() and bytearray.fromhex(): they are now
between 2x and 3.5x faster. Changes:
* Use a fast-path working on a char* string for ASCII string
* Use a slow-path for non-ASCII string
* Replace slow hex_digit_to_int() function with a O(1) lookup in
_PyLong_DigitValue precomputed table
* Use _PyBytesWriter API to handle the buffer
* Add unit tests to check the error position in error messages
Issue #25399: Don't create temporary bytes objects: modify _PyBytes_Format() to
create work directly on bytearray objects.
* Rename _PyBytes_Format() to _PyBytes_FormatEx() just in case if something
outside CPython uses it
* _PyBytes_FormatEx() now uses (char*, Py_ssize_t) for the input string, so
bytearray_format() doesn't need tot create a temporary input bytes object
* Add use_bytearray parameter to _PyBytes_FormatEx() which is passed to
_PyBytesWriter, to create a bytearray buffer instead of a bytes buffer
Most formatting operations are now between 2.5 and 5 times faster.
* Add much more unit tests on PyBytes_FromFormatV()
* Remove the first loop to compute the length of the output string
* Use _PyBytesWriter to handle the bytes buffer, use overallocation
* Cleanup the code to make simpler and easier to review
Don't require _PyBytesWriter pointer to be a "char *". Same change for
_PyBytesWriter_WriteBytes() parameter.
For example, binascii uses "unsigned char*".
Optimize bytes.__mod__(args) for integere formats: %d (%i, %u), %o, %x and %X.
_PyBytesWriter is now used to format directly the integer into the writer
buffer, instead of using a temporary bytes object.
Formatting is between 30% and 50% faster on a microbenchmark.
* Thanks to the _PyBytesWriter API, output smaller than 512 bytes are allocated
on the stack and so avoid calling _PyBytes_Resize(). Because of that, change
the default buffer size to fmtcnt instead of fmtcnt+100.
* Rely on _PyBytesWriter algorithm to overallocate the buffer instead of using
a custom code. For example, _PyBytesWriter uses a different overallocation
factor (25% or 50%) depending on the platform to get best performances.
* Disable overallocation for the last write.
* Replace C loops to fill characters with memset()
* Add also many comments to _PyBytes_Format()
* Remove unused FORMATBUFLEN constant
* Avoid the creation of a temporary bytes object when formatting a floating
point number (when no custom formatting option is used)
* Fix also reference leaks on error handling
* Use Py_MEMCPY() to copy bytes between two formatters (%)
Some time ago we changed the docs to consistently use the term 'bytes-like
object' in all the contexts where bytes, bytearray, memoryview, etc are used.
This patch (by Ezio Melotti) completes that work by changing the error
messages that previously reported that certain types did "not support the
buffer interface" to instead say that a bytes-like object is required. (The
glossary entry for bytes-like object references the discussion of the buffer
protocol in the docs.)
PyObject_Calloc(), _PyObject_GC_Calloc(). bytes(int) and bytearray(int) are now
using ``calloc()`` instead of ``malloc()`` for large objects which is faster
and use less memory (until the bytearray buffer is filled with data).
computation as the overflow behavior of signed integers is undefined.
In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.
Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).
Cleanup only - no functionality or hash values change.
The NULL encoding check in bytes_decode() was unnecessary because this case
is already taken care of by the call to _Py_normalize_encoding() inside
PyUnicode_Decode().
* Remove _PyBytes_FormatLong(): inline it into formatlong()
* the input type is always a long, so remove the code for bool
* don't duplicate the string if the length does not change
* Use PyUnicode_DATA() instead of _PyUnicode_AsString()
* In debug mode, fill the string data with invalid characters
* Simplify also reference counting in PyCodec_BackslashReplaceErrors()
and PyCodec_XMLCharRefReplaceError()
in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated.
The environment variable PYTHONHASHSEED and the new command line flag -R control this
behavior.
in order to make algorithmic complexity attacks on (e.g.) web apps much more complicated.
The environment variable PYTHONHASHSEED and the new command line flag -R control this
behavior.
to a single #define instead of having several copies in several files.
This excludes the Modules/ tree (datetime and expat both have a copy
for their own purposes with no need for it to be the same).
* Specify the default encoding: write 'utf-8' instead of
sys.getdefaultencoding(), because the default encoding is now constant
* Specify the default errors value
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r84629 | senthil.kumaran | 2010-09-08 18:20:29 +0530 (Wed, 08 Sep 2010) | 3 lines
Revert the doc change done in r83880. str.replace with negative count value is not a feature.
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r84070 | antoine.pitrou | 2010-08-15 19:12:55 +0200 (dim., 15 août 2010) | 5 lines
Fix some compilation warnings under 64-bit Windows (issue #9566).
Some of these are genuine bugs with objects bigger than 2GB, but
my system doesn't allow me to write tests for it.
........
r84074 | antoine.pitrou | 2010-08-15 19:41:31 +0200 (dim., 15 août 2010) | 3 lines
Fix (harmless) warning with MSVC.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r77461 | antoine.pitrou | 2010-01-13 08:55:48 +0100 (mer., 13 janv. 2010) | 5 lines
Issue #7622: Improve the split(), rsplit(), splitlines() and replace()
methods of bytes, bytearray and unicode objects by using a common
implementation based on stringlib's fast search. Patch by Florent Xicluna.
........
- Add special-cases for list and tuple objects.
- Use _PyObject_LengthHint() instead of an arbitrary value for the
size of the initial buffer of the returned object.
Addresses the float -> string conversion, using David Gay's code which
was added in Mark Dickinson's checkin r71663.
Also addresses these, which are intertwined with the short repr
changes:
- Issue #5772: format(1e100, '<') produces '1e+100', not '1.0e+100'
- Issue #5515: 'n' formatting with commas no longer works poorly
with leading zeros.
- PEP 378 Format Specifier for Thousands Separator: implemented
for floats.
This is incomplete, but I want to get some version into the next alpha. I am still working on:
Documentation.
More tests.
Implement for floats.
In addition, there's an existing bug with 'n' formatting that carries forward to thousands grouping (issue 5515).
svn+ssh://pythondev@svn.python.org/python/trunk
........
r66748 | christian.heimes | 2008-10-02 21:47:50 +0200 (Thu, 02 Oct 2008) | 1 line
Fixed a couple more C99 comments and one occurence of inline.
........
+ another // comment in bytesobject
svn+ssh://pythondev@svn.python.org/python/trunk
........
r65654 | martin.v.loewis | 2008-08-12 16:49:50 +0200 (Tue, 12 Aug 2008) | 6 lines
Issue #3139: Make buffer-interface thread-safe wrt. PyArg_ParseTuple,
by denying s# to parse objects that have a releasebuffer procedure,
and introducing s*.
More module might need to get converted to use s*.
........
PyUnicode_AsStringAndSize -> _PyUnicode_AsStringAndSize to mark
them for interpreter internal use only.
We'll have to rework these APIs or create new ones for the
purpose of accessing the UTF-8 representation of Unicode objects
for 3.1.
Use faster PyUnicode_FromEncodedObject() for bytes/bytearray.decode().
Add new PyCodec_KnownEncoding() API.
Add new PyUnicode_AsDecodedUnicode() and PyUnicode_AsEncodedUnicode() APIs.
Add missing PyUnicode_AsDecodedObject() to unicodeobject.h
Fix punicode codec to also work on memoryviews.
Before this change, the following example would output:
>>> b = bytearray(b"hello")
>>> b += "world"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to bytearray
svn+ssh://pythondev@svn.python.org/python/trunk
........
r62260 | gregory.p.smith | 2008-04-10 01:11:56 +0200 (Thu, 10 Apr 2008) | 2 lines
better diagnostics
........
r62261 | gregory.p.smith | 2008-04-10 01:16:37 +0200 (Thu, 10 Apr 2008) | 3 lines
Raise SystemError when size < 0 is passed into PyString_FromStringAndSize,
PyBytes_FromStringAndSize or PyUnicode_FromStringAndSize. [issue2587]
........
r62266 | neal.norwitz | 2008-04-10 07:46:39 +0200 (Thu, 10 Apr 2008) | 5 lines
Remove the test file before writing it in case there is no write permission.
This might help fix some of the failures on Windows box(es). It doesn't hurt
either way and ensure the tests are a little more self contained (ie have
less assumptions).
........
r62271 | gregory.p.smith | 2008-04-10 21:50:36 +0200 (Thu, 10 Apr 2008) | 2 lines
get rid of assert (size >= 0) now that an explicit if (size < 0) is in the code.
........
r62277 | andrew.kuchling | 2008-04-10 23:27:10 +0200 (Thu, 10 Apr 2008) | 1 line
Remove forward-looking statement
........
r62278 | andrew.kuchling | 2008-04-10 23:28:51 +0200 (Thu, 10 Apr 2008) | 1 line
Add punctuation
........
r62279 | andrew.kuchling | 2008-04-10 23:29:01 +0200 (Thu, 10 Apr 2008) | 1 line
Use issue directive
........
r62289 | thomas.heller | 2008-04-11 15:05:38 +0200 (Fri, 11 Apr 2008) | 3 lines
Move backwards compatibility macro to the correct place;
PyIndex_Check() was introduced in Python 2.5.
........
r62290 | thomas.heller | 2008-04-11 16:20:26 +0200 (Fri, 11 Apr 2008) | 2 lines
Performance improvements.
........
r62293 | christian.heimes | 2008-04-12 15:03:03 +0200 (Sat, 12 Apr 2008) | 2 lines
Applied patch #2617 from Frank Wierzbicki wit some extras from me
-J and -X are now reserved for Jython and non-standard arguments (e.g. IronPython). I've added some extra comments to make sure the reservation don't get missed in the future.
........
r62294 | georg.brandl | 2008-04-12 20:11:18 +0200 (Sat, 12 Apr 2008) | 2 lines
Use absolute path in sys.path.
........
r62295 | georg.brandl | 2008-04-12 20:36:09 +0200 (Sat, 12 Apr 2008) | 2 lines
#2615: small consistency update by Jeroen Ruigrok van der Werven.
........
r62296 | georg.brandl | 2008-04-12 21:00:20 +0200 (Sat, 12 Apr 2008) | 2 lines
Add Jeroen.
........
r62297 | georg.brandl | 2008-04-12 21:05:37 +0200 (Sat, 12 Apr 2008) | 2 lines
Don't offend snake lovers.
........
r62298 | gregory.p.smith | 2008-04-12 22:37:48 +0200 (Sat, 12 Apr 2008) | 2 lines
fix compiler warnings
........
r62302 | gregory.p.smith | 2008-04-13 00:24:04 +0200 (Sun, 13 Apr 2008) | 3 lines
socket.error inherits from IOError, it no longer needs listing in
the all_errors tuple.
........
r62303 | brett.cannon | 2008-04-13 01:44:07 +0200 (Sun, 13 Apr 2008) | 8 lines
Re-implement the 'warnings' module in C. This allows for usage of the
'warnings' code in places where it was previously not possible (e.g., the
parser). It could also potentially lead to a speed-up in interpreter start-up
if the C version of the code (_warnings) is imported over the use of the
Python version in key places.
Closes issue #1631171.
........
r62304 | gregory.p.smith | 2008-04-13 02:03:25 +0200 (Sun, 13 Apr 2008) | 3 lines
Adds a profile-opt target for easy compilation of a python binary using
gcc's profile guided optimization.
........
r62305 | brett.cannon | 2008-04-13 02:18:44 +0200 (Sun, 13 Apr 2008) | 3 lines
Fix a bug in PySys_HasWarnOption() where it was not properly checking the
length of the list storing the warning options.
........
r62306 | brett.cannon | 2008-04-13 02:25:15 +0200 (Sun, 13 Apr 2008) | 2 lines
Fix an accidental bug of an non-existent init function.
........
r62308 | andrew.kuchling | 2008-04-13 03:05:59 +0200 (Sun, 13 Apr 2008) | 1 line
Mention -J, -X
........
r62311 | benjamin.peterson | 2008-04-13 04:20:05 +0200 (Sun, 13 Apr 2008) | 2 lines
Give the "Interactive Interpreter Changes" section in 2.6 whatsnew a unique link name
........
r62313 | brett.cannon | 2008-04-13 04:42:36 +0200 (Sun, 13 Apr 2008) | 3 lines
Fix test_warnings by making the state of things more consistent for each test
when it is run.
........
r62314 | skip.montanaro | 2008-04-13 05:17:30 +0200 (Sun, 13 Apr 2008) | 2 lines
spelling
........
r62315 | georg.brandl | 2008-04-13 09:07:44 +0200 (Sun, 13 Apr 2008) | 2 lines
Fix markup.
........
r62319 | christian.heimes | 2008-04-13 11:30:17 +0200 (Sun, 13 Apr 2008) | 1 line
Fix compiler warning Include/warnings.h:19:28: warning: no newline at end of file
........
r62320 | christian.heimes | 2008-04-13 11:33:24 +0200 (Sun, 13 Apr 2008) | 1 line
Use PyString_InternFromString instead of PyString_FromString for static vars
........
r62321 | christian.heimes | 2008-04-13 11:37:05 +0200 (Sun, 13 Apr 2008) | 1 line
Added new files to the pcbuild files
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r61964 | benjamin.peterson | 2008-03-27 01:25:33 +0100 (Thu, 27 Mar 2008) | 2 lines
add commas for introductory clauses
........
r61965 | christian.heimes | 2008-03-27 02:36:21 +0100 (Thu, 27 Mar 2008) | 1 line
Hopefully added _fileio module to the Windows build system
........
r61966 | christian.heimes | 2008-03-27 02:38:47 +0100 (Thu, 27 Mar 2008) | 1 line
Revert commit accident
........
r61967 | neal.norwitz | 2008-03-27 04:49:54 +0100 (Thu, 27 Mar 2008) | 3 lines
Fix bytes so it works on 64-bit platforms.
(Also remove some #if 0 code that is already handled in _getbytevalue.)
........
r61968 | neal.norwitz | 2008-03-27 05:40:07 +0100 (Thu, 27 Mar 2008) | 1 line
Fix memory leaks
........
r61969 | neal.norwitz | 2008-03-27 05:40:50 +0100 (Thu, 27 Mar 2008) | 3 lines
Fix warnings about using char as an array subscript. This is not portable
since char is signed on some platforms and unsigned on others.
........
r61970 | neal.norwitz | 2008-03-27 06:02:57 +0100 (Thu, 27 Mar 2008) | 1 line
Fix test_compiler after adding unicode_literals
........
r61971 | neal.norwitz | 2008-03-27 06:03:11 +0100 (Thu, 27 Mar 2008) | 1 line
Fix compiler warnings
........
r61972 | neal.norwitz | 2008-03-27 07:52:01 +0100 (Thu, 27 Mar 2008) | 1 line
Pluralss only need one s, not 2 (intss -> ints)
........
r61973 | christian.heimes | 2008-03-27 10:02:33 +0100 (Thu, 27 Mar 2008) | 1 line
Quick 'n dirty hack: Increase the magic by 2 to force a rebuild of pyc/pyo files on the build bots
........
r61974 | eric.smith | 2008-03-27 10:42:35 +0100 (Thu, 27 Mar 2008) | 3 lines
Added test cases for single quoted strings, both forms of triple quotes,
and some string concatenations.
Removed unneeded __future__ print_function import.
........
r61975 | christian.heimes | 2008-03-27 11:35:52 +0100 (Thu, 27 Mar 2008) | 1 line
Build bots are working again - removing the hack
........
r61976 | christian.heimes | 2008-03-27 12:46:37 +0100 (Thu, 27 Mar 2008) | 2 lines
Fixed tokenize tests
The tokenize module doesn't understand __future__.unicode_literals yet
........
r61977 | georg.brandl | 2008-03-27 14:27:31 +0100 (Thu, 27 Mar 2008) | 2 lines
#2248: return result of QUIT from quit().
........
r61978 | georg.brandl | 2008-03-27 14:34:59 +0100 (Thu, 27 Mar 2008) | 2 lines
The bug for which there was a test in outstanding_bugs.py was agreed not to be a bug.
........
r61979 | amaury.forgeotdarc | 2008-03-28 00:23:54 +0100 (Fri, 28 Mar 2008) | 5 lines
Issue2495: tokenize.untokenize did not insert space between two consecutive string literals:
"" "" => """", which is invalid code.
Will backport
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r59377 | georg.brandl | 2007-12-06 01:24:23 +0100 (Thu, 06 Dec 2007) | 2 lines
Add another GHOP student to ACKS.
........
r59378 | raymond.hettinger | 2007-12-06 01:56:53 +0100 (Thu, 06 Dec 2007) | 5 lines
Fix Issue 1045.
Factor-out common calling code by simplifying the length_hint API.
Speed-up the function by caching the PyObject_String for the attribute lookup.
........
r59380 | georg.brandl | 2007-12-06 02:52:24 +0100 (Thu, 06 Dec 2007) | 2 lines
Diverse markup fixes.
........
r59383 | georg.brandl | 2007-12-06 10:45:39 +0100 (Thu, 06 Dec 2007) | 2 lines
Better re.split examples.
........
r59386 | christian.heimes | 2007-12-06 14:15:13 +0100 (Thu, 06 Dec 2007) | 2 lines
Fixed get_config_h_filename for Windows. Without the patch it can't find the pyconfig.h file inside a build tree.
Added several small unit tests for sysconfig.
........
r59387 | christian.heimes | 2007-12-06 14:30:11 +0100 (Thu, 06 Dec 2007) | 1 line
Silence more warnings, _CRT_NONSTDC_NO_DEPRECATE is already defined in pyconfig.h but several projects don't include it.
........
r59389 | christian.heimes | 2007-12-06 14:55:01 +0100 (Thu, 06 Dec 2007) | 1 line
Disabled one test that is failing on Unix
........
r59399 | christian.heimes | 2007-12-06 22:13:06 +0100 (Thu, 06 Dec 2007) | 8 lines
Several Windows related cleanups:
* Removed a #define from pyconfig.h. The macro was already defined a few lines higher.
* Fixed path to tix in the build_tkinter.py script
* Changed make_buildinfo.c to use versions of unlink and strcat which are considered safe by Windows (as suggested by MvL).
* Removed two defines from pyproject.vsprops that are no longer required. Both are defined in pyconfig.h and make_buildinfo.c doesn't use the unsafe versions any more (as suggested by MvL).
* Added some more information about PGO and the property files to PCbuild9/readme.txt.
Are you fine with the changes, Martin?
........
r59400 | raymond.hettinger | 2007-12-07 02:53:01 +0100 (Fri, 07 Dec 2007) | 4 lines
Don't have the docs berate themselves. Keep a professional tone.
If a todo is needed, put it in the tracker.
........
r59402 | georg.brandl | 2007-12-07 10:07:10 +0100 (Fri, 07 Dec 2007) | 3 lines
Increase unit test coverage of SimpleXMLRPCServer.
Written for GHOP by Turkay Eren.
........
r59406 | georg.brandl | 2007-12-07 16:16:57 +0100 (Fri, 07 Dec 2007) | 2 lines
Update to windows doc from Robert.
........
No detailed change log; just check out the change log for the py3k-pep3137
branch. The most obvious changes:
- str8 renamed to bytes (PyString at the C level);
- bytes renamed to buffer (PyBytes at the C level);
- PyString and PyUnicode are no longer compatible.
I.e. we now have an immutable bytes type and a mutable bytes type.
The behavior of PyString was modified quite a bit, to make it more
bytes-like. Some changes are still on the to-do list.
Add a bytes iterator (copied from stringobject.c and reindented :-).
I (Guido) added a small change to _abcoll.py to remove the registration
of bytes as a virtual subtype of Iterator -- the presence of __iter__
will handle that now.