Commit Graph

69 Commits

Author SHA1 Message Date
Miss Islington (bot) fc73c54dae bpo-32110: codecs.StreamReader.read(n) now returns not more than n (GH-4499) (#4623)
characters/bytes for non-negative n.  This makes it compatible with
read() methods of other file-like objects.
(cherry picked from commit 219c2de5ad)
2017-11-29 02:15:43 +02:00
Martin Panter b362f75f6e Issue #25523: Correct "a" article to "an" article
This changes the main documentation, doc strings, source code comments, and a
couple error messages in the test suite. In some cases the word was removed
to fix the grammar.
2015-11-02 03:37:02 +00:00
Berker Peksag ffc7e8eebe Issue #12160: Fix incorrect StreamCodec references in Codec.encode() and Codec.decode() docs.
It should StreamWriter for Codecs.encode() and StreamReader for Codec.decode().

Patch by Nick Weinhold.
2015-07-30 23:27:13 +03:00
Serhiy Storchaka c7797dc748 Issue #19543: Emit deprecation warning for known non-text encodings.
Backported issues #19619: encode() and decode() methods and constructors
of str, unicode and bytearray classes now emit deprecation warning for known
non-text encodings when Python is ran with the -3 option.

Backported issues #20404: io.TextIOWrapper (and hence io.open()) now uses the
internal codec marking system added to emit deprecation warning for known non-text
encodings at stream construction time when Python is ran with the -3 option.
2015-05-31 20:21:00 +03:00
Serhiy Storchaka c811328e44 Escaped backslashes in docstrings. 2015-04-03 18:12:32 +03:00
Serhiy Storchaka 74a651b4e6 Issue #23071: Added missing names to codecs.__all__. Patch by Martin Panter. 2014-12-20 17:42:24 +02:00
Serhiy Storchaka 4a44f8791c Fixed a bug in previous changeset: StreamReader returned '' instead of u''. 2014-01-26 21:19:59 +02:00
Serhiy Storchaka 2403a787b9 Issue #8260: The read(), readline() and readlines() methods of
codecs.StreamReader returned incomplete data when were called after
readline() or read(size).  Based on patch by Amaury Forgeot d'Arc.
2014-01-26 19:20:24 +02:00
Victor Stinner 7df55dad3b Issue #6268: More bugfixes about BOM, UTF-16 and UTF-32
* Fix seek() method of codecs.open(), don't write the BOM twice after seek(0)
 * Fix reset() method of codecs, UTF-16, UTF-32 and StreamWriter classes
 * test_codecs: use "w+" mode instead of "wt+". "t" mode is not supported by
   Solaris or Windows, but does it really exist? I found it the in the issue.
2010-05-22 13:37:56 +00:00
Victor Stinner 262be5e70b Issue #6268: Fix seek() method of codecs.open(), don't read the BOM twice
after seek(0)
2010-05-22 02:11:07 +00:00
Florent Xicluna f4b6186d9c #691291: codecs.open() should not convert end of lines on reading and writing. 2010-02-26 10:40:58 +00:00
Christian Heimes 1a6387e683 Merged revisions 61750,61752,61754,61756,61760,61763,61768,61772,61775,61805,61809,61812,61819,61917,61920,61930,61933-61934 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/trunk-bytearray

........
  r61750 | christian.heimes | 2008-03-22 20:47:44 +0100 (Sat, 22 Mar 2008) | 1 line

  Copied files from py3k w/o modifications
........
  r61752 | christian.heimes | 2008-03-22 20:53:20 +0100 (Sat, 22 Mar 2008) | 7 lines

  Take One
  * Added initialization code, warnings, flags etc. to the appropriate places
  * Added new buffer interface to string type
  * Modified tests
  * Modified Makefile.pre.in to compile the new files
  * Added bytesobject.c to Python.h
........
  r61754 | christian.heimes | 2008-03-22 21:22:19 +0100 (Sat, 22 Mar 2008) | 2 lines

  Disabled bytearray.extend for now since it causes an infinite recursion
  Fixed serveral unit tests
........
  r61756 | christian.heimes | 2008-03-22 21:43:38 +0100 (Sat, 22 Mar 2008) | 5 lines

  Added PyBytes support to several places:
  str + bytearray
  ord(bytearray)
  bytearray(str, encoding)
........
  r61760 | christian.heimes | 2008-03-22 21:56:32 +0100 (Sat, 22 Mar 2008) | 1 line

  Fixed more unit tests related to type('') is not unicode
........
  r61763 | christian.heimes | 2008-03-22 22:20:28 +0100 (Sat, 22 Mar 2008) | 2 lines

  Fixed more unit tests
  Fixed bytearray.extend
........
  r61768 | christian.heimes | 2008-03-22 22:40:50 +0100 (Sat, 22 Mar 2008) | 1 line

  Implemented old buffer interface for bytearray
........
  r61772 | christian.heimes | 2008-03-22 23:24:52 +0100 (Sat, 22 Mar 2008) | 1 line

  Added backport of the io module
........
  r61775 | christian.heimes | 2008-03-23 03:50:49 +0100 (Sun, 23 Mar 2008) | 1 line

  Fix str assignement to bytearray. Assignment of a str of size 1 is interpreted as a single byte
........
  r61805 | christian.heimes | 2008-03-23 19:33:48 +0100 (Sun, 23 Mar 2008) | 3 lines

  Fixed more tests
  Fixed bytearray() comparsion with unicode()
  Fixed iterator assignment of bytearray
........
  r61809 | christian.heimes | 2008-03-23 21:02:21 +0100 (Sun, 23 Mar 2008) | 2 lines

  str(bytesarray()) now returns the bytes and not the representation of the bytearray object
  Enabled and fixed more unit tests
........
  r61812 | christian.heimes | 2008-03-23 21:53:08 +0100 (Sun, 23 Mar 2008) | 3 lines

  Clear error PyNumber_AsSsize_t() fails
  Use CHARMASK for ob_svall access
  disabled a test with memoryview again
........
  r61819 | christian.heimes | 2008-03-23 23:05:57 +0100 (Sun, 23 Mar 2008) | 1 line

  Untested updates to the PCBuild directory
........
  r61917 | christian.heimes | 2008-03-26 00:57:06 +0100 (Wed, 26 Mar 2008) | 1 line

  The type system of Python 2.6 has subtle differences to 3.0's. I've removed the Py_TPFLAGS_BASETYPE flags from bytearray for now. bytearray can't be subclasses until the issues with bytearray subclasses are fixed.
........
  r61920 | christian.heimes | 2008-03-26 01:44:08 +0100 (Wed, 26 Mar 2008) | 2 lines

  Disabled last failing test
  I don't understand what the test is testing and how it suppose to work. Ka-Ping, please check it out.
........
  r61930 | christian.heimes | 2008-03-26 12:46:18 +0100 (Wed, 26 Mar 2008) | 1 line

  Re-enabled bytes warning code
........
  r61933 | christian.heimes | 2008-03-26 13:20:46 +0100 (Wed, 26 Mar 2008) | 1 line

  Fixed a bug in the new buffer protocol. The buffer slots weren't copied into a subclass.
........
  r61934 | christian.heimes | 2008-03-26 13:25:09 +0100 (Wed, 26 Mar 2008) | 1 line

  Re-enabled bytearray subclassing - all tests are passing.
........
2008-03-26 12:49:49 +00:00
Georg Brandl 8f99f81dfc Fix codecs.EncodedFile which did not use file_encoding in 2.5.0, and
fix all codecs file wrappers to work correctly with the "with"
statement (bug #1586513).
2006-10-29 08:39:22 +00:00
Walter Dörwald 78a0be6ab3 Add a BufferedIncrementalEncoder class that can be used for implementing
an incremental encoder that must retain part of the data between calls
to the encode() method.

Fix the incremental encoder and decoder for the IDNA encoding.

This closes SF patch #1453235.
2006-04-14 18:25:39 +00:00
Walter Dörwald b17f12bbc6 Fix wrong attribute name. 2006-04-14 15:40:54 +00:00
Walter Dörwald 6a7ec7c3e2 Change raise statement to PEP 8 style. 2006-03-18 16:35:17 +00:00
Neal Norwitz 6bed1c1fab Add some versionadded info to new incremental codec docs and fix doco nits. 2006-03-16 07:49:19 +00:00
Walter Dörwald abb02e5994 Patch #1436130: codecs.lookup() now returns a CodecInfo object (a subclass
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Walter Dörwald ca199432c2 If size is specified, try to read at least size characters.
This is a alternative version of patch #1379332.
2006-03-06 22:39:12 +00:00
Tim Peters 536cf99536 Whitespace normalization. 2005-12-25 23:18:31 +00:00
Martin v. Löwis 4ed673877d Patch #1268314: Cache lines in StreamReader.readlines for performance.
Will backport to Python 2.4.
2005-09-18 08:34:39 +00:00
Walter Dörwald c5238b8288 SF bug #1235646: codecs.StreamRecoder.next() now reencodes the data it reads
from the input stream, so that the output is a byte string in the correct
encoding instead of a unicode string.
2005-09-01 11:56:53 +00:00
Martin v. Löwis 56066d2e55 Return complete lines from codec stream readers
even if there is an exception in later lines, resulting in
correct line numbers for decoding errors in source code. Fixes #1178484.
Will backport to 2.4.
2005-08-24 07:38:12 +00:00
Walter Dörwald c9878e1b22 Make attributes and local variables in the StreamReader str objects instead
of unicode objects, so that codecs that do a str->str decoding won't promote
the result to unicode. This fixes SF bug #1241507.
2005-07-20 22:15:39 +00:00
Walter Dörwald a4eb2d56a4 Fix comment. 2005-04-21 21:42:35 +00:00
Walter Dörwald bc8e642c1b If the data read from the bytestream in readline() ends in a '\r' read one more
byte, even if the user has passed a size parameter. This extra byte shouldn't
cause a buffer overflow in the tokenizer. The original plan was to return a line
ending in '\r', which might be recognizable as a complete line and skip any '\n'
that was read afterwards. Unfortunately this didn't work, as the tokenizer only
recognizes '\n' as line ends, which in turn lead to joined lines and
SyntaxErrors, so this special treatment of a split '\r\n' has been dropped. (It
can only happen with a temporarily exhausted bytestream now anyway.)
Fixes parts of SF bugs #1163244 and #1175396.
2005-04-21 21:32:03 +00:00
Walter Dörwald 714f87821f Fix typos. 2005-04-04 21:42:22 +00:00
Walter Dörwald 7a6dc139de Fix for SF bug #1175396: readline() will now read one more character, if
the last character read is "\r" (and size is None, i.e. we're allowed to
call read() multiple times), so that we can return the correct line ending
(this additional character might be a "\n").

If the stream is temporarily exhausted, we might return the wrong line ending
(if the last character read is "\r" and the next one (after the byte stream
provides more data) is "\n", but at least the atcr member ensure that we
get the correct number of lines (i.e. this "\n" will not be treated as
another line ending.)
2005-04-04 21:38:47 +00:00
Skip Montanaro 9f5f9d943d typo 2005-03-16 03:51:56 +00:00
Walter Dörwald 71fd90da87 Add default value for "whence" argument. 2005-03-14 19:25:41 +00:00
Walter Dörwald 729c31f5c3 Reset internal buffers when seek() is called. This fixes SF bug #1156259. 2005-03-14 19:06:30 +00:00
Martin v. Löwis e2713becd8 Build with --disable-unicode again. Fixes #1158607.
Will backport to 2.4.
2005-03-08 15:03:08 +00:00
Walter Dörwald 9fa0946771 Fix and test for SF bug #1098990: codec readline() splits lines apart. 2005-01-10 12:01:39 +00:00
Walter Dörwald e57d7b179a The changes to the stateful codecs in 2.4 resulted in StreamReader.readline()
trying to return a complete line even if a size parameter was given (see
http://www.python.org/sf/1076985). This leads to buffer overflows with long
source lines under Windows if e.g. cp1252 is used as the source encoding.
This patch reverts the behaviour of readline() to something that behaves more
like Python 2.3: If a size parameter is given, read() is called only once.

As a side effect of this, readline() now supports all types of linebreaks
supported by unicode.splitlines().

Note that the tokenizer is still broken and it's possible to provoke segfaults
(see http://www.python.org/sf/1089395).
2004-12-21 22:24:00 +00:00
Hye-Shik Chang af5c7cff56 SF #1048865: Fix a trivial typo that breaks StreamReader.readlines() 2004-10-17 23:51:21 +00:00
Walter Dörwald 69652035bc SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now support
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Marc-André Lemburg d594849c42 Ignore sizehint argument. Fixes SF #844561. 2004-02-26 15:22:17 +00:00
Walter Dörwald 7f3ed74643 Fix typos. 2003-02-02 23:08:27 +00:00
Neal Norwitz 6ec0a8ab93 sys was already imported, remove second import 2002-12-30 23:36:02 +00:00
Marc-André Lemburg b28de0d79f Patch to make _codecs a builtin module. This is necessary since
Python 2.3 will support source code encodings which rely on the
builtin codecs being available to the parser.

Remove struct dependency from codecs.py
2002-12-12 17:37:50 +00:00
Walter Dörwald 7f82f7955e Add missing documentation for the PEP 293 functionality to
the codecs docstrings.
2002-11-19 21:42:53 +00:00
Walter Dörwald 4dbf192f2b Add next() and __iter__() methods to StreamReader, StreamReaderWriter
and StreamRecoder.

This closes SF bug #634246.
2002-11-06 16:53:44 +00:00
Walter Dörwald 3aeb632c31 PEP 293 implemention (from SF patch http://www.python.org/sf/432401) 2002-09-02 13:14:32 +00:00
Walter Dörwald 474458da48 Add constants BOM_UTF8, BOM_UTF16, BOM_UTF16_LE, BOM_UTF16_BE,
BOM_UTF32, BOM_UTF32_LE and BOM_UTF32_BE that represent the Byte
Order Mark in UTF-8, UTF-16 and UTF-32 encodings for little and
big endian systems.

The old names BOM32_* and BOM64_* were off by a factor of 2.

This closes SF bug http://www.python.org/sf/555360
2002-06-04 15:16:29 +00:00
Raymond Hettinger 54f0222547 SF 563203. Replaced 'has_key()' with 'in'. 2002-06-01 14:18:47 +00:00
Martin v. Löwis b786e61b1c Set default value for readlines.sizehint to None. Change needed for 2.2.1
as well.
2002-03-05 15:46:38 +00:00
Marc-André Lemburg aa32c5aa7c Added new helpers for easy access to codecs. Docs will follow. 2001-09-19 11:24:48 +00:00
Andrew M. Kuchling 97c56357b1 Fix typo in comment 2001-09-18 20:29:48 +00:00
Martin v. Löwis 02d893cfae Patch #444359: Remove unused imports. 2001-08-02 07:15:29 +00:00
Martin v. Löwis 6cd441d129 Add dead imports of modules that are "magically" imported. 2001-07-31 08:54:55 +00:00