Commit Graph

34 Commits

Author SHA1 Message Date
Hye-Shik Chang 50d1f7935d #1276: Add temporary encoding aliases for non-supported Mac CJK
encodings that are detected as system defaults in MacOS with CJK
locales.  Will be replaced by properly-implemented codecs in 3.1.
2008-08-23 08:03:03 +00:00
Christian Heimes b9819954aa The bz2 codec isn't supported any more. I've also commented out several codecs which were removed in the past. 2007-12-02 15:27:38 +00:00
Walter Dörwald 41980caf64 Apply SF patch #1775604: This adds three new codecs (utf-32, utf-32-le and
ut-32-be). On narrow builds the codecs combine surrogate pairs in the unicode
object into one codepoint on encoding and create surrogate pairs for
codepoints outside the BMP on decoding. Lone surrogates are passed through
unchanged in all cases.

Backport to the trunk will follow.
2007-08-16 21:55:45 +00:00
Thomas Wouters 9fe394c1be Merged revisions 53538-53622 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r53545 | andrew.kuchling | 2007-01-24 21:06:41 +0100 (Wed, 24 Jan 2007) | 1 line

  Strengthen warning about using lock()
........
  r53556 | thomas.heller | 2007-01-25 19:34:14 +0100 (Thu, 25 Jan 2007) | 3 lines

  Fix for #1643874: When calling SysAllocString, create a PyCObject
  which will eventually call SysFreeString to free the BSTR resource.
........
  r53563 | andrew.kuchling | 2007-01-25 21:02:13 +0100 (Thu, 25 Jan 2007) | 1 line

  Add item
........
  r53564 | brett.cannon | 2007-01-25 21:22:02 +0100 (Thu, 25 Jan 2007) | 8 lines

  Fix time.strptime's %U support.  Basically rewrote the algorithm to be more
  generic so that one only has to shift certain values based on whether the week
  was specified to start on Monday or Sunday.  Cut out a lot of edge case code
  compared to the previous version.  Also broke algorithm out into its own
  function (that is private to the module).

  Fixes bug #1643943 (thanks Biran Nahas for the report).
........
  r53570 | brett.cannon | 2007-01-26 00:30:39 +0100 (Fri, 26 Jan 2007) | 4 lines

  Remove specific mention of my name and email address from modules.  Not really
  needed and all bug reports should go to the bug tracker, not directly to me.
  Plus I am not the only person to have edited these files at this point.
........
  r53573 | fred.drake | 2007-01-26 17:28:44 +0100 (Fri, 26 Jan 2007) | 1 line

  fix typo (extraneous ")")
........
  r53575 | georg.brandl | 2007-01-27 18:43:02 +0100 (Sat, 27 Jan 2007) | 4 lines

  Patch #1638243: the compiler package is now able to correctly compile
  a with statement; previously, executing code containing a with statement
  compiled by the compiler package crashed the interpreter.
........
  r53578 | georg.brandl | 2007-01-27 18:59:42 +0100 (Sat, 27 Jan 2007) | 3 lines

  Patch #1634778: add missing encoding aliases for iso8859_15 and
  iso8859_16.
........
  r53579 | georg.brandl | 2007-01-27 20:38:50 +0100 (Sat, 27 Jan 2007) | 2 lines

  Bug #1645944: os.access now returns bool but docstring is not updated
........
  r53590 | brett.cannon | 2007-01-28 21:58:00 +0100 (Sun, 28 Jan 2007) | 2 lines

  Use the thread lock's context manager instead of a try/finally statement.
........
  r53591 | brett.cannon | 2007-01-29 05:41:44 +0100 (Mon, 29 Jan 2007) | 2 lines

  Add a test for slicing an exception.
........
  r53594 | andrew.kuchling | 2007-01-29 21:21:43 +0100 (Mon, 29 Jan 2007) | 1 line

  Minor edits to the curses HOWTO
........
  r53596 | andrew.kuchling | 2007-01-29 21:55:40 +0100 (Mon, 29 Jan 2007) | 1 line

  Various minor edits
........
  r53597 | andrew.kuchling | 2007-01-29 22:28:48 +0100 (Mon, 29 Jan 2007) | 1 line

  More edits
........
  r53601 | tim.peters | 2007-01-30 04:03:46 +0100 (Tue, 30 Jan 2007) | 2 lines

  Whitespace normalization.
........
  r53603 | georg.brandl | 2007-01-30 21:21:30 +0100 (Tue, 30 Jan 2007) | 2 lines

  Bug #1648191: typo in docs.
........
  r53605 | brett.cannon | 2007-01-30 22:34:36 +0100 (Tue, 30 Jan 2007) | 8 lines

  No more raising of string exceptions!

  The next step of PEP 352 (for 2.6) causes raising a string exception to trigger
  a TypeError.  Trying to catch a string exception raises a DeprecationWarning.
  References to string exceptions has been removed from the docs since they are
  now just an error.
........
  r53618 | raymond.hettinger | 2007-02-01 22:02:59 +0100 (Thu, 01 Feb 2007) | 1 line

  Bug #1648179:  set.update() not recognizing __iter__ overrides in dict subclasses.
........
2007-02-05 01:24:16 +00:00
Marc-André Lemburg 7797be7b3b Alias iso8859_1 to latin_1 which is the same encoding, but has
a much faster codec implementation.
2005-10-21 14:02:28 +00:00
Walter Dörwald 007f8dfde2 Bug #1245379: Add "unicode-1-1-utf-7" as an alias for "utf-7" as specified
by RFC 1642.
2005-10-09 19:42:27 +00:00
Marc-André Lemburg 9ab8818c87 Rearranged mappings to value sorting order. 2004-12-10 21:54:35 +00:00
Marc-André Lemburg c759f070ef Added new codecs and aliases for ISO_8859-11, ISO_8859-16 and
TIS-620.

Closes SF bug #1001895: Adding missing ISO 8859 codecs, especially Thai.
2004-08-05 12:43:30 +00:00
Marc-André Lemburg cd8a4cb3d3 Added new codec hp-roman8 submitted as patch [ 996067 ] hp-roman8 codec. 2004-07-28 15:35:29 +00:00
Hye-Shik Chang 2bb146f2f4 Bring CJKCodecs 1.1 into trunk. This completely reorganizes source
and installed layouts to make maintenance simple and easy.  And it
also adds four new codecs; big5hkscs, euc-jis-2004, shift-jis-2004
and iso2022-jp-2004.
2004-07-18 03:06:29 +00:00
Hye-Shik Chang 5c5316f111 Add a new unicode codec: ptcp154 (Kazakh) 2004-03-19 08:06:07 +00:00
Marc-André Lemburg 5c94d33077 Add some more code page aliases needed for completeness. 2004-01-20 09:38:52 +00:00
Hye-Shik Chang b619e4b36c Fix a typo: s/iso_3022/iso2022/ 2004-01-20 09:33:30 +00:00
Hye-Shik Chang 3e2a306920 Add CJK codecs support as discussed on python-dev. (SF #873597)
Several style fixes are suggested by Martin v. Loewis and
Marc-Andre Lemburg. Thanks!
2004-01-17 14:29:29 +00:00
Raymond Hettinger 9a80c5dbc4 Added codec for bz2 compression. 2003-09-23 20:21:01 +00:00
Marc-André Lemburg 8dc5ff2e5a Undo the removal. Guido mentioned that the encoding name is in active
by some email headers.
2002-10-04 16:30:42 +00:00
Marc-André Lemburg 68fc27385d Remove unneeded alias. 2002-10-04 15:57:03 +00:00
Marc-André Lemburg a40ea75625 Fix doc-string. 2002-10-04 11:58:24 +00:00
Marc-André Lemburg 9d158bb66f Adapt lookup names to new more general encoding name normalization
scheme.
2002-10-04 11:51:39 +00:00
Guido van Rossum 479f3d3d2a Oops, must convert hyphens to underscores in keys of aliases dict. 2002-09-26 20:08:23 +00:00
Guido van Rossum b7a88e533d Add yet another alias for ASCII found in the field. Will backport to
2.2.2.
2002-09-25 16:44:34 +00:00
Marc-André Lemburg a0af63b242 Corrected import behaviour for codecs which live outside the encodings
package.
2002-02-11 17:43:46 +00:00
Marc-André Lemburg 462004e90a Add IANA character set aliases to the encodings alias dictionary
and make alias lookup lazy.

Note that only those IANA character set aliases were added for which
we actually have codecs in the encodings package.
2002-02-10 21:36:20 +00:00
Martin v. Löwis 79d802d58c Patch #487275: Add windows-1251 charset alias. 2001-12-02 12:24:19 +00:00
Marc-André Lemburg c60e6f7771 Patch #435971: UTF-7 codec by Brian Quinlan. 2001-09-20 10:35:46 +00:00
Martin v. Löwis 9b75dca192 Expose nl_langinfo through locale where available. 2001-08-10 13:58:50 +00:00
Martin v. Löwis 13b8bc5478 Patch #429957: Add support for cp1140, which is identical to cp037,
with the addition of the euro character.
Also added a few EDBDIC aliases.
2001-06-07 19:39:25 +00:00
Mark Hammond 194bfb2805 Add some useful Windows encodings - patch #423221. 2001-06-04 02:31:23 +00:00
Guido van Rossum acfdf156aa Add quoted-printable codec 2001-05-15 15:34:07 +00:00
Marc-André Lemburg 2d9204199f This patch changes the way the string .encode() method works slightly
and introduces a new method .decode().

The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.

Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.

The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.

As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).

Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
2001-05-15 12:00:02 +00:00
Marc-André Lemburg 4fd73f0465 Marc-Andre Lemburg <mal@lemburg.com>:
Added some more codec aliases. Some of them are needed by the
new locale.py encoding support.
2000-06-07 09:12:30 +00:00
Guido van Rossum 9e896b37c7 Marc-Andre's third try at this bulk patch seems to work (except that
his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent).  Checkin messages:


New Unicode support for int(), float(), complex() and long().

- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
  Unicode to a decimal char* string (used in the above new
  APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above

Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
  are masked, all other errors such as ValueError during
  Unicode coercion are passed through (note that PyUnicode_Compare
  does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
  masked and 0 returned; all other errors are passed through

Better testing support for the standard codecs.

Misc minor enhancements, such as an alias dbcs for the mbcs codec.

Changes:
- PyLong_FromString() now applies the same error checks as
  does PyInt_FromString(): trailing garbage is reported
  as error and not longer silently ignored. The only characters
  which may be trailing the digits are 'L' and 'l' -- these
  are still silently ignored.
- string.ato?() now directly interface to int(), long() and
  float(). The error strings are now a little different, but
  the type still remains the same. These functions are now
  ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
  in the input string; PyNumber_Long() already did this (and
  still does)

Followed by:

Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).

I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
2000-04-05 20:11:21 +00:00
Guido van Rossum 68895ed70c Marc-Andre Lemburg: use all lowercase names. 2000-03-31 17:23:18 +00:00
Guido van Rossum 0229bf6001 Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00