Commit Graph

99 Commits

Author SHA1 Message Date
Martin v. Löwis 4ed673877d Patch #1268314: Cache lines in StreamReader.readlines for performance.
Will backport to Python 2.4.
2005-09-18 08:34:39 +00:00
Walter Dörwald c5238b8288 SF bug #1235646: codecs.StreamRecoder.next() now reencodes the data it reads
from the input stream, so that the output is a byte string in the correct
encoding instead of a unicode string.
2005-09-01 11:56:53 +00:00
Martin v. Löwis 56066d2e55 Return complete lines from codec stream readers
even if there is an exception in later lines, resulting in
correct line numbers for decoding errors in source code. Fixes #1178484.
Will backport to 2.4.
2005-08-24 07:38:12 +00:00
Walter Dörwald c9878e1b22 Make attributes and local variables in the StreamReader str objects instead
of unicode objects, so that codecs that do a str->str decoding won't promote
the result to unicode. This fixes SF bug #1241507.
2005-07-20 22:15:39 +00:00
Walter Dörwald a4eb2d56a4 Fix comment. 2005-04-21 21:42:35 +00:00
Walter Dörwald bc8e642c1b If the data read from the bytestream in readline() ends in a '\r' read one more
byte, even if the user has passed a size parameter. This extra byte shouldn't
cause a buffer overflow in the tokenizer. The original plan was to return a line
ending in '\r', which might be recognizable as a complete line and skip any '\n'
that was read afterwards. Unfortunately this didn't work, as the tokenizer only
recognizes '\n' as line ends, which in turn lead to joined lines and
SyntaxErrors, so this special treatment of a split '\r\n' has been dropped. (It
can only happen with a temporarily exhausted bytestream now anyway.)
Fixes parts of SF bugs #1163244 and #1175396.
2005-04-21 21:32:03 +00:00
Walter Dörwald 714f87821f Fix typos. 2005-04-04 21:42:22 +00:00
Walter Dörwald 7a6dc139de Fix for SF bug #1175396: readline() will now read one more character, if
the last character read is "\r" (and size is None, i.e. we're allowed to
call read() multiple times), so that we can return the correct line ending
(this additional character might be a "\n").

If the stream is temporarily exhausted, we might return the wrong line ending
(if the last character read is "\r" and the next one (after the byte stream
provides more data) is "\n", but at least the atcr member ensure that we
get the correct number of lines (i.e. this "\n" will not be treated as
another line ending.)
2005-04-04 21:38:47 +00:00
Skip Montanaro 9f5f9d943d typo 2005-03-16 03:51:56 +00:00
Walter Dörwald 71fd90da87 Add default value for "whence" argument. 2005-03-14 19:25:41 +00:00
Walter Dörwald 729c31f5c3 Reset internal buffers when seek() is called. This fixes SF bug #1156259. 2005-03-14 19:06:30 +00:00
Martin v. Löwis e2713becd8 Build with --disable-unicode again. Fixes #1158607.
Will backport to 2.4.
2005-03-08 15:03:08 +00:00
Walter Dörwald 9fa0946771 Fix and test for SF bug #1098990: codec readline() splits lines apart. 2005-01-10 12:01:39 +00:00
Walter Dörwald e57d7b179a The changes to the stateful codecs in 2.4 resulted in StreamReader.readline()
trying to return a complete line even if a size parameter was given (see
http://www.python.org/sf/1076985). This leads to buffer overflows with long
source lines under Windows if e.g. cp1252 is used as the source encoding.
This patch reverts the behaviour of readline() to something that behaves more
like Python 2.3: If a size parameter is given, read() is called only once.

As a side effect of this, readline() now supports all types of linebreaks
supported by unicode.splitlines().

Note that the tokenizer is still broken and it's possible to provoke segfaults
(see http://www.python.org/sf/1089395).
2004-12-21 22:24:00 +00:00
Hye-Shik Chang af5c7cff56 SF #1048865: Fix a trivial typo that breaks StreamReader.readlines() 2004-10-17 23:51:21 +00:00
Walter Dörwald 69652035bc SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now support
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Marc-André Lemburg d594849c42 Ignore sizehint argument. Fixes SF #844561. 2004-02-26 15:22:17 +00:00
Walter Dörwald 7f3ed74643 Fix typos. 2003-02-02 23:08:27 +00:00
Neal Norwitz 6ec0a8ab93 sys was already imported, remove second import 2002-12-30 23:36:02 +00:00
Marc-André Lemburg b28de0d79f Patch to make _codecs a builtin module. This is necessary since
Python 2.3 will support source code encodings which rely on the
builtin codecs being available to the parser.

Remove struct dependency from codecs.py
2002-12-12 17:37:50 +00:00
Walter Dörwald 7f82f7955e Add missing documentation for the PEP 293 functionality to
the codecs docstrings.
2002-11-19 21:42:53 +00:00
Walter Dörwald 4dbf192f2b Add next() and __iter__() methods to StreamReader, StreamReaderWriter
and StreamRecoder.

This closes SF bug #634246.
2002-11-06 16:53:44 +00:00
Walter Dörwald 3aeb632c31 PEP 293 implemention (from SF patch http://www.python.org/sf/432401) 2002-09-02 13:14:32 +00:00
Walter Dörwald 474458da48 Add constants BOM_UTF8, BOM_UTF16, BOM_UTF16_LE, BOM_UTF16_BE,
BOM_UTF32, BOM_UTF32_LE and BOM_UTF32_BE that represent the Byte
Order Mark in UTF-8, UTF-16 and UTF-32 encodings for little and
big endian systems.

The old names BOM32_* and BOM64_* were off by a factor of 2.

This closes SF bug http://www.python.org/sf/555360
2002-06-04 15:16:29 +00:00
Raymond Hettinger 54f0222547 SF 563203. Replaced 'has_key()' with 'in'. 2002-06-01 14:18:47 +00:00
Martin v. Löwis b786e61b1c Set default value for readlines.sizehint to None. Change needed for 2.2.1
as well.
2002-03-05 15:46:38 +00:00
Marc-André Lemburg aa32c5aa7c Added new helpers for easy access to codecs. Docs will follow. 2001-09-19 11:24:48 +00:00
Andrew M. Kuchling 97c56357b1 Fix typo in comment 2001-09-18 20:29:48 +00:00
Martin v. Löwis 02d893cfae Patch #444359: Remove unused imports. 2001-08-02 07:15:29 +00:00
Martin v. Löwis 6cd441d129 Add dead imports of modules that are "magically" imported. 2001-07-31 08:54:55 +00:00
Tim Peters 3a2ab1ab69 Whitespace normalization. 2001-05-29 06:06:54 +00:00
Marc-André Lemburg 716cf91839 Moved the encoding map building logic from the individual mapping
codec files to codecs.py and added logic so that multi mappings
in the decoding maps now result in mappings to None (undefined mapping)
in the encoding maps.
2001-05-16 09:41:45 +00:00
Tim Peters 30324a7363 Just changed "x,y" to "x, y" everywhere (i.e., inserted horizontal space
after commas that didn't have any).
2001-05-15 17:19:16 +00:00
Skip Montanaro e99d5ea25b added __all__ lists to a number of Python modules
added test script and expected output file as well
this closes patch 103297.
__all__ attributes will be added to other modules without first submitting
a patch, just adding the necessary line to the test script to verify
more-or-less correct implementation.
2001-01-20 19:54:20 +00:00
Tim Peters 88869f9787 Whitespace normalization. 2001-01-14 23:36:06 +00:00
Marc-André Lemburg a866df806d This patch changes the default behaviour of the builtin charmap
codec to not apply Latin-1 mappings for keys which are not found
in the mapping dictionaries, but instead treat them as undefined
mappings.

The patch was originally written by Martin v. Loewis with some
additional (cosmetic) changes and an updated test script
by Marc-Andre Lemburg.

The standard codecs were recreated from the most current files
available at the Unicode.org site using the Tools/scripts/gencodec.py
tool.

This patch closes the bugs #116285 and #119960.
2001-01-03 21:29:14 +00:00
Andrew M. Kuchling c6c2838403 (Patch #102698) Fix for a bug reported by Wade Leftwich:
StreamReader ignores the 'errors' parameter passed to its constructor
2000-12-10 15:12:14 +00:00
Fred Drake d254c0095c Remove redundent information from a docstring. 2000-10-02 22:11:47 +00:00
Thomas Wouters 7e47402264 Spelling fixes supplied by Rob W. W. Hooft. All these are fixes in either
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").

There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
2000-07-16 12:04:32 +00:00
Marc-André Lemburg 349a3d3a9a Marc-Andre Lemburg <mal@lemburg.com>:
Made codecs.open() default to 'rb' as file mode.
2000-06-21 21:21:04 +00:00
Guido van Rossum d58c26fec6 Marc-Andre Lemburg:
The two methods .readline() and .readlines() in StreamReaderWriter
didn't define the self argument. Found by Tom Emerson.
2000-05-01 16:17:32 +00:00
Fred Drake 49fd1077bc M.-A. Lemburg <mal@lemburg.com>:
Added more documentation. Clarified some existing comments.
2000-04-13 14:11:21 +00:00
Guido van Rossum 1c89b0eeef Deleted trailing whitespace. This is really a way to be able to add
a missing part of the previous checkin message:

Marc-Andre Lemburg:

Added encoding name attributes to wrapper classes which
allow applications to check the used encoding names.
2000-04-11 15:41:38 +00:00
Guido van Rossum a3277139f1 Marc-Andre Lemburg:
Added .writelines(), .readlines() and .readline() to all
codec classes.
2000-04-11 15:37:43 +00:00
Guido van Rossum b95de4f847 Marc-Andre Lemburg: Error reporting in the codec registry and lookup
mechanism is enhanced to be more informative.
2000-03-31 17:25:23 +00:00
Guido van Rossum d8855fde88 Marc-Andre Lemburg:
Attached you find the latest update of the Unicode implementation.
The patch is against the current CVS version.

It includes the fix I posted yesterday for the core dump problem
in codecs.c (was introduced by my previous patch set -- sorry),
adds more tests for the codecs and two new parser markers
"es" and "es#".
2000-03-24 22:14:19 +00:00
Fred Drake 908670cdaa Oops, another in the same file; I should read the mail fully before
checking in; sorry!

"the the" --> "the" (in docstring); noted by Detlef Lannert
<lannert@lannert.rz.uni-duesseldorf.de>.
2000-03-17 15:42:11 +00:00
Fred Drake 3e74c0d021 "intput" --> "input" (in docstring); noted by Detlef Lannert
<lannert@lannert.rz.uni-duesseldorf.de>.
2000-03-17 15:40:35 +00:00
Guido van Rossum 0612d84155 Module codecs -- Python Codec Registry, API and helpers. Written by
Marc-Andre Lemburg.
2000-03-10 23:20:43 +00:00