renamed Include/bytesobject.h to Include/bytearrayobject.h
renamed Include/stringobject.h to Include/bytesobject.h
added Include/stringobject.h with aliases
Rather than sprinkle casts throughout the code, change Py_CHARMASK to
always cast it's result to an unsigned char. This should ensure we
do the right thing when accessing an array with the result.
svn+ssh://pythondev@svn.python.org/python/branches/trunk-bytearray
........
r61750 | christian.heimes | 2008-03-22 20:47:44 +0100 (Sat, 22 Mar 2008) | 1 line
Copied files from py3k w/o modifications
........
r61752 | christian.heimes | 2008-03-22 20:53:20 +0100 (Sat, 22 Mar 2008) | 7 lines
Take One
* Added initialization code, warnings, flags etc. to the appropriate places
* Added new buffer interface to string type
* Modified tests
* Modified Makefile.pre.in to compile the new files
* Added bytesobject.c to Python.h
........
r61754 | christian.heimes | 2008-03-22 21:22:19 +0100 (Sat, 22 Mar 2008) | 2 lines
Disabled bytearray.extend for now since it causes an infinite recursion
Fixed serveral unit tests
........
r61756 | christian.heimes | 2008-03-22 21:43:38 +0100 (Sat, 22 Mar 2008) | 5 lines
Added PyBytes support to several places:
str + bytearray
ord(bytearray)
bytearray(str, encoding)
........
r61760 | christian.heimes | 2008-03-22 21:56:32 +0100 (Sat, 22 Mar 2008) | 1 line
Fixed more unit tests related to type('') is not unicode
........
r61763 | christian.heimes | 2008-03-22 22:20:28 +0100 (Sat, 22 Mar 2008) | 2 lines
Fixed more unit tests
Fixed bytearray.extend
........
r61768 | christian.heimes | 2008-03-22 22:40:50 +0100 (Sat, 22 Mar 2008) | 1 line
Implemented old buffer interface for bytearray
........
r61772 | christian.heimes | 2008-03-22 23:24:52 +0100 (Sat, 22 Mar 2008) | 1 line
Added backport of the io module
........
r61775 | christian.heimes | 2008-03-23 03:50:49 +0100 (Sun, 23 Mar 2008) | 1 line
Fix str assignement to bytearray. Assignment of a str of size 1 is interpreted as a single byte
........
r61805 | christian.heimes | 2008-03-23 19:33:48 +0100 (Sun, 23 Mar 2008) | 3 lines
Fixed more tests
Fixed bytearray() comparsion with unicode()
Fixed iterator assignment of bytearray
........
r61809 | christian.heimes | 2008-03-23 21:02:21 +0100 (Sun, 23 Mar 2008) | 2 lines
str(bytesarray()) now returns the bytes and not the representation of the bytearray object
Enabled and fixed more unit tests
........
r61812 | christian.heimes | 2008-03-23 21:53:08 +0100 (Sun, 23 Mar 2008) | 3 lines
Clear error PyNumber_AsSsize_t() fails
Use CHARMASK for ob_svall access
disabled a test with memoryview again
........
r61819 | christian.heimes | 2008-03-23 23:05:57 +0100 (Sun, 23 Mar 2008) | 1 line
Untested updates to the PCBuild directory
........
r61917 | christian.heimes | 2008-03-26 00:57:06 +0100 (Wed, 26 Mar 2008) | 1 line
The type system of Python 2.6 has subtle differences to 3.0's. I've removed the Py_TPFLAGS_BASETYPE flags from bytearray for now. bytearray can't be subclasses until the issues with bytearray subclasses are fixed.
........
r61920 | christian.heimes | 2008-03-26 01:44:08 +0100 (Wed, 26 Mar 2008) | 2 lines
Disabled last failing test
I don't understand what the test is testing and how it suppose to work. Ka-Ping, please check it out.
........
r61930 | christian.heimes | 2008-03-26 12:46:18 +0100 (Wed, 26 Mar 2008) | 1 line
Re-enabled bytes warning code
........
r61933 | christian.heimes | 2008-03-26 13:20:46 +0100 (Wed, 26 Mar 2008) | 1 line
Fixed a bug in the new buffer protocol. The buffer slots weren't copied into a subclass.
........
r61934 | christian.heimes | 2008-03-26 13:25:09 +0100 (Wed, 26 Mar 2008) | 1 line
Re-enabled bytearray subclassing - all tests are passing.
........
Highlights:
- Adding PyObject_Format.
- Adding string.Format class.
- Adding __format__ for str, unicode, int, long, float, datetime.
- Adding builtin format.
- Adding ''.format and u''.format.
- str/unicode fixups for formatters.
The files in Objects/stringlib that implement PEP 3101 (stringdefs.h,
unicodedefs.h, formatter.h, string_format.h) are identical in trunk
and py3k. Any changes from here on should be made to trunk, and
changes will propogate to py3k).
in Object/ are named ``free_list``, the counter ``numfree`` and the upper
limit is a macro ``PyName_MAXFREELIST`` inside an #ifndef block.
The chances should make it easier to adjust Python for platforms with
less memory, e.g. mobile phones.
First chapter of the Python 3.0 io framework back port: _fileio
The next step depends on a working bytearray type which itself depends on a backport of the nwe buffer API.
as usual with slicing (both with str and unicode strings). This
fixes issue 1259.
For str only the stringobject.c file was modified. But for unicode,
I needed to repeat in the four functions a lot of code, so created
a new function that does part of the job for them (and placed it in
find.h, following a suggestion of Barry).
Also added tests for this behaviour.
also hex escapes) -- this was reaching beyond the end of the input string
buffer, even though it is not supposed to be \0-terminated.
This has no visible effect but is clearly the correct thing to do.
(In 3.0 it had a visible effect after removing ob_sstate from PyString.)
- Specialcase extended slices that amount to a shallow copy the same way as
is done for simple slices, in the tuple, string and unicode case.
- Specialcase step-1 extended slices to optimize the common case for all
involved types.
- For lists, allow extended slice assignment of differing lengths as long
as the step is 1. (Previously, 'l[:2:1] = []' failed even though
'l[:2] = []' and 'l[:2:None] = []' do not.)
- Implement extended slicing for buffer, array, structseq, mmap and
UserString.UserString.
- Implement slice-object support (but not non-step-1 slice assignment) for
UserString.MutableString.
- Add tests for all new functionality.
a large width is passed on 32-bit platforms. Found by Google.
It would be good for people to review this especially carefully and verify
I don't have an off by one error and there is no other way to cause overflow.
of some of the common builtin types.
Use a bit in tp_flags for each common builtin type. Check the bit
to determine if any instance is a subclass of these common types.
The check avoids a function call and O(n) search of the base classes.
The check is done in the various Py*_Check macros rather than calling
PyType_IsSubtype().
All the bits are set in tp_flags when the type is declared
in the Objects/*object.c files because PyType_Ready() is not called
for all the types. Should PyType_Ready() be called for all types?
If so and the change is made, the changes to the Objects/*object.c files
can be reverted (remove setting the tp_flags). Objects/typeobject.c
would also have to be modified to add conditions
for Py*_CheckExact() in addition to each the PyType_IsSubtype check.
* unified the way intobject, longobject and mystrtoul handle
values around -sys.maxint-1.
* in general, trying to entierely avoid overflows in any computation
involving signed ints or longs is extremely involved. Fixed a few
simple cases where a compiler might be too clever (but that's all
guesswork).
* more overflow checks against bad data in marshal.c.
* 2.5 specific: fixed a number of places that were still confusing int
and Py_ssize_t. Some of them could potentially have caused
"real-world" breakage.
* list.pop(x): fixing overflow issues on x was messy. I just reverted
to PyArg_ParseTuple("n"), which does the right thing. (An obscure
test was trying to give a Decimal to list.pop()... doesn't make
sense any more IMHO)
* trying to write a few tests...
Replace UnicodeDecodeErrors raised during == and !=
compares of Unicode and other objects with a new
UnicodeWarning.
All other comparisons continue to raise exceptions.
Exceptions other than UnicodeDecodeErrors are also left
untouched.
I modified this patch some by fixing style, some error checking, and adding
XXX comments. This patch requires review and some changes are to be expected.
I'm checking in now to get the greatest possible review and establish a
baseline for moving forward. I don't want this to hold up release if possible.
results in a 2.5x speedup on the stringbench count tests, and a 20x (!)
speedup on the stringbench search/find/contains test, compared to 2.5a2.
for more on the algorithm, see:
http://effbot.org/zone/stringlib.htm
if you get weird results, you can disable the new algoritm by undefining
USE_FAST in Objects/unicodeobject.c.
enjoy /F
speed up splitlines and strip with charsets; etc. rsplit is now as
fast as split in all our tests (reverse takes no time at all), and
splitlines() is nearly as fast as a plain split("\n") in our tests.
and we're not done yet... ;-)
compiler warnings on Windows (signed vs unsigned mismatch
in comparisons). Cleaned that up by switching more locals
to Py_ssize_t. Simplified overflow checking (it can _be_
simpler because while these things are declared as
Py_ssize_t, then should in fact never be negative).
there)
- Add missing DECREFs of inner-scope 'temp' variable
- Add various missing DECREFs by changing 'return NULL' into 'goto onError'
- Avoid double DECREF when last _PyUnicode_Resize() fails
Coverity found one of the missing DECREFs, but oddly enough not the others.
Py_SAFE_DOWNCAST can evaluate its first argument multiple
times in a debug build. This caused two distinct assert-
failures in test_unicode run under a debug build. Rewrote
the code in trivial ways so that multiple evaluation of the
first argument doesn't hurt.
This is how string objects work. u'%f' could use , instead of .
for the decimal point. Now both strings and unicode always use periods.
This is the code that would break:
import locale
locale.setlocale(locale.LC_NUMERIC, 'de_DE')
u'%.1f' % 1.0
assert '1.0' == u'%.1f' % 1.0
I couldn't create a test case which fails, but this fixes the problem.
Will backport.
In C++, it's an error to pass a string literal to a char* function
without a const_cast(). Rather than require every C++ extension
module to put a cast around string literals, fix the API to state the
const-ness.
I focused on parts of the API where people usually pass literals:
PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type
slots, etc. Predictably, there were a large set of functions that
needed to be fixed as a result of these changes. The most pervasive
change was to make the keyword args list passed to
PyArg_ParseTupleAndKewords() to be a const char *kwlist[].
One cast was required as a result of the changes: A type object
mallocs the memory for its tp_doc slot and later frees it.
PyTypeObject says that tp_doc is const char *; but if the type was
created by type_new(), we know it is safe to cast to char *.
[ 1327110 ] wrong TypeError traceback in generator expressions
by removing the code that can stomp on the users' TypeError raised by the
iterable argument to ''.join() -- PySequence_Fast (now?) gives a perfectly
reasonable message itself. Also, a couple of tests.
PyUnicode_DecodeCharmap() the accept a unicode string as the mapping
argument which is used as a mapping table.
This code isn't used by any of the codecs yet.
about illegal code points. The codec now supports PEP 293 style error handlers.
(This is a variant of the Nik Haldimann's patch that detects truncated data)
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
need to convert str objects from the iterable to unicode. So, if
someone set the system default encoding to something nasty enough,
the conversion process could mutate the input iterable as a side
effect, and PySequence_Fast doesn't hide that from us if the input was
a list. IOW, can't assume the size of PySequence_Fast's result is
invariant across PyUnicode_FromObject() calls.
much to reduce the size of the code, but greatly improves its clarity.
It's also quicker in what's probably the most common case (the argument
iterable is a list). Against it, if the iterable isn't a list or a tuple,
a temp tuple is materialized containing the entire input sequence, and
that's a bigger temp memory burden. Yawn.
1. u1.join([u2]) is u2
2. Be more careful about C-level int overflow.
Since PySequence_Fast() isn't needed to achieve #1, it's not used -- but
the code could sure be simpler if it were.
unicodedata.east_asian_width(). You can still implement your own
simple width() function using it like this:
def width(u):
w = 0
for c in unicodedata.normalize('NFC', u):
cwidth = unicodedata.east_asian_width(c)
if cwidth in ('W', 'F'): w += 2
else: w += 1
return w
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)