Commit Graph

28 Commits

Author SHA1 Message Date
Marc-André Lemburg 9ab8818c87 Rearranged mappings to value sorting order. 2004-12-10 21:54:35 +00:00
Marc-André Lemburg c759f070ef Added new codecs and aliases for ISO_8859-11, ISO_8859-16 and
TIS-620.

Closes SF bug #1001895: Adding missing ISO 8859 codecs, especially Thai.
2004-08-05 12:43:30 +00:00
Marc-André Lemburg cd8a4cb3d3 Added new codec hp-roman8 submitted as patch [ 996067 ] hp-roman8 codec. 2004-07-28 15:35:29 +00:00
Hye-Shik Chang 2bb146f2f4 Bring CJKCodecs 1.1 into trunk. This completely reorganizes source
and installed layouts to make maintenance simple and easy.  And it
also adds four new codecs; big5hkscs, euc-jis-2004, shift-jis-2004
and iso2022-jp-2004.
2004-07-18 03:06:29 +00:00
Hye-Shik Chang 5c5316f111 Add a new unicode codec: ptcp154 (Kazakh) 2004-03-19 08:06:07 +00:00
Marc-André Lemburg 5c94d33077 Add some more code page aliases needed for completeness. 2004-01-20 09:38:52 +00:00
Hye-Shik Chang b619e4b36c Fix a typo: s/iso_3022/iso2022/ 2004-01-20 09:33:30 +00:00
Hye-Shik Chang 3e2a306920 Add CJK codecs support as discussed on python-dev. (SF #873597)
Several style fixes are suggested by Martin v. Loewis and
Marc-Andre Lemburg. Thanks!
2004-01-17 14:29:29 +00:00
Raymond Hettinger 9a80c5dbc4 Added codec for bz2 compression. 2003-09-23 20:21:01 +00:00
Marc-André Lemburg 8dc5ff2e5a Undo the removal. Guido mentioned that the encoding name is in active
by some email headers.
2002-10-04 16:30:42 +00:00
Marc-André Lemburg 68fc27385d Remove unneeded alias. 2002-10-04 15:57:03 +00:00
Marc-André Lemburg a40ea75625 Fix doc-string. 2002-10-04 11:58:24 +00:00
Marc-André Lemburg 9d158bb66f Adapt lookup names to new more general encoding name normalization
scheme.
2002-10-04 11:51:39 +00:00
Guido van Rossum 479f3d3d2a Oops, must convert hyphens to underscores in keys of aliases dict. 2002-09-26 20:08:23 +00:00
Guido van Rossum b7a88e533d Add yet another alias for ASCII found in the field. Will backport to
2.2.2.
2002-09-25 16:44:34 +00:00
Marc-André Lemburg a0af63b242 Corrected import behaviour for codecs which live outside the encodings
package.
2002-02-11 17:43:46 +00:00
Marc-André Lemburg 462004e90a Add IANA character set aliases to the encodings alias dictionary
and make alias lookup lazy.

Note that only those IANA character set aliases were added for which
we actually have codecs in the encodings package.
2002-02-10 21:36:20 +00:00
Martin v. Löwis 79d802d58c Patch #487275: Add windows-1251 charset alias. 2001-12-02 12:24:19 +00:00
Marc-André Lemburg c60e6f7771 Patch #435971: UTF-7 codec by Brian Quinlan. 2001-09-20 10:35:46 +00:00
Martin v. Löwis 9b75dca192 Expose nl_langinfo through locale where available. 2001-08-10 13:58:50 +00:00
Martin v. Löwis 13b8bc5478 Patch #429957: Add support for cp1140, which is identical to cp037,
with the addition of the euro character.
Also added a few EDBDIC aliases.
2001-06-07 19:39:25 +00:00
Mark Hammond 194bfb2805 Add some useful Windows encodings - patch #423221. 2001-06-04 02:31:23 +00:00
Guido van Rossum acfdf156aa Add quoted-printable codec 2001-05-15 15:34:07 +00:00
Marc-André Lemburg 2d9204199f This patch changes the way the string .encode() method works slightly
and introduces a new method .decode().

The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.

Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.

The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.

As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).

Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
2001-05-15 12:00:02 +00:00
Marc-André Lemburg 4fd73f0465 Marc-Andre Lemburg <mal@lemburg.com>:
Added some more codec aliases. Some of them are needed by the
new locale.py encoding support.
2000-06-07 09:12:30 +00:00
Guido van Rossum 9e896b37c7 Marc-Andre's third try at this bulk patch seems to work (except that
his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent).  Checkin messages:


New Unicode support for int(), float(), complex() and long().

- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
  Unicode to a decimal char* string (used in the above new
  APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above

Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
  are masked, all other errors such as ValueError during
  Unicode coercion are passed through (note that PyUnicode_Compare
  does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
  masked and 0 returned; all other errors are passed through

Better testing support for the standard codecs.

Misc minor enhancements, such as an alias dbcs for the mbcs codec.

Changes:
- PyLong_FromString() now applies the same error checks as
  does PyInt_FromString(): trailing garbage is reported
  as error and not longer silently ignored. The only characters
  which may be trailing the digits are 'L' and 'l' -- these
  are still silently ignored.
- string.ato?() now directly interface to int(), long() and
  float(). The error strings are now a little different, but
  the type still remains the same. These functions are now
  ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
  in the input string; PyNumber_Long() already did this (and
  still does)

Followed by:

Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).

I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
2000-04-05 20:11:21 +00:00
Guido van Rossum 68895ed70c Marc-Andre Lemburg: use all lowercase names. 2000-03-31 17:23:18 +00:00
Guido van Rossum 0229bf6001 Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00