Commit Graph

10 Commits

Author SHA1 Message Date
Victor Stinner 73363e817a Issue #6213: Implement getstate() and setstate() methods of utf-8-sig and
utf-16 incremental encoders.
2010-07-28 01:39:45 +00:00
Victor Stinner 54b40ee929 Fix my last commit (r81471) about codecs
Rememder: don't touch the code just before a commit
2010-05-22 13:44:25 +00:00
Victor Stinner 7df55dad3b Issue #6268: More bugfixes about BOM, UTF-16 and UTF-32
* Fix seek() method of codecs.open(), don't write the BOM twice after seek(0)
 * Fix reset() method of codecs, UTF-16, UTF-32 and StreamWriter classes
 * test_codecs: use "w+" mode instead of "wt+". "t" mode is not supported by
   Solaris or Windows, but does it really exist? I found it the in the issue.
2010-05-22 13:37:56 +00:00
Walter Dörwald abb02e5994 Patch #1436130: codecs.lookup() now returns a CodecInfo object (a subclass
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Walter Dörwald 729c31f5c3 Reset internal buffers when seek() is called. This fixes SF bug #1156259. 2005-03-14 19:06:30 +00:00
Walter Dörwald 69652035bc SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now support
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Tim Peters 469cdad822 Whitespace normalization. 2002-08-08 20:19:19 +00:00
Marc-André Lemburg 3ccb09cba3 Fix for bug #222395: UTF-16 et al. don't handle .readline().
They now raise an NotImplementedError to hint to the truth ;-)
2002-04-05 12:12:00 +00:00
Marc-André Lemburg 92b550cdd8 This patch by Martin v. Loewis changes the UTF-16 codec to only
write a BOM at the start of the stream and also to only read it as
BOM at the start of a stream.

Subsequent reading/writing of BOMs will read/write the BOM as ZWNBSP
character. This is in sync with the Unicode specifications.

Note that UTF-16 files will now *have* to start with a BOM mark
in order to be readable by the codec.
2001-06-19 20:07:51 +00:00
Guido van Rossum 0229bf6001 Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00