cpython/Lib/encodings/aliases.py

""" Encoding Aliases Support

    This module is used by the encodings package search function to
    map encodings names to module names.

    Note that the search function converts the encoding names to lower
    case and replaces hyphens with underscores *before* performing the
    lookup.

"""
aliases = {

    # Latin-1
    'latin': 'latin_1',
    'latin1': 'latin_1',
    
    # UTF-7
    'utf7': 'utf_7',
    'u7': 'utf_7',
    
    # UTF-8
    'utf': 'utf_8',
    'utf8': 'utf_8',
    'u8': 'utf_8',
    'utf8@ucs2': 'utf_8',
    'utf8@ucs4': 'utf_8',
    
    # UTF-16
    'utf16': 'utf_16',
    'u16': 'utf_16',
    'utf_16be': 'utf_16_be',
    'utf_16le': 'utf_16_le',
    'unicodebigunmarked': 'utf_16_be',
    'unicodelittleunmarked': 'utf_16_le',

    # ASCII
    'us_ascii': 'ascii',
    'ansi_x3.4_1968': 'ascii', # used on Linux
    '646': 'ascii',            # used on Solaris

    # EBCDIC
    'ebcdic_cp_us': 'cp037',
    'ibm039': 'cp037',
    'ibm1140': 'cp1140',
    
    # ISO
    '8859': 'latin_1',
    'iso8859': 'latin_1',
    'iso8859_1': 'latin_1',
    'iso_8859_1': 'latin_1',
    'iso_8859_10': 'iso8859_10',
    'iso_8859_13': 'iso8859_13',
    'iso_8859_14': 'iso8859_14',
    'iso_8859_15': 'iso8859_15',
    'iso_8859_2': 'iso8859_2',
    'iso_8859_3': 'iso8859_3',
    'iso_8859_4': 'iso8859_4',
    'iso_8859_5': 'iso8859_5',
    'iso_8859_6': 'iso8859_6',
    'iso_8859_7': 'iso8859_7',
    'iso_8859_8': 'iso8859_8',
    'iso_8859_9': 'iso8859_9',

    # Mac
    'maclatin2': 'mac_latin2',
    'maccentraleurope': 'mac_latin2',
    'maccyrillic': 'mac_cyrillic',
    'macgreek': 'mac_greek',
    'maciceland': 'mac_iceland',
    'macroman': 'mac_roman',
    'macturkish': 'mac_turkish',

    # Windows
    'windows_1251': 'cp1251',
    'windows_1252': 'cp1252',
    'windows_1254': 'cp1254',
    'windows_1255': 'cp1255',

    # MBCS
    'dbcs': 'mbcs',

    # Code pages
    '437': 'cp437',

    # CJK
    #
    # The codecs for these encodings are not distributed with the
    # Python core, but are included here for reference, since the
    # locale module relies on having these aliases available.
    #
    'jis_7': 'jis_7',
    'iso_2022_jp': 'jis_7',
    'ujis': 'euc_jp',
    'ajec': 'euc_jp',
    'eucjp': 'euc_jp',
    'tis260': 'tactis',
    'sjis': 'shift_jis',

    # Content transfer/compression encodings
    'rot13': 'rot_13',
    'base64': 'base64_codec',
    'base_64': 'base64_codec',
    'zlib': 'zlib_codec',
    'zip': 'zlib_codec',
    'hex': 'hex_codec',
    'uu': 'uu_codec',
    'quopri': 'quopri_codec',
    'quotedprintable': 'quopri_codec',
    'quoted_printable': 'quopri_codec',

}
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 19:17:24 -04:00			`""" Encoding Aliases Support`

			`This module is used by the encodings package search function to`
			`map encodings names to module names.`

			`Note that the search function converts the encoding names to lower`
			`case and replaces hyphens with underscores before performing the`
			`lookup.`

			`"""`
			`aliases = {`

			`# Latin-1`
			`'latin': 'latin_1',`
			`'latin1': 'latin_1',`

Patch #435971: UTF-7 codec by Brian Quinlan. 2001-09-20 07:35:46 -03:00			`# UTF-7`
			`'utf7': 'utf_7',`
			`'u7': 'utf_7',`

Marc-Andre Lemburg: Unicode encodings. 2000-03-10 19:17:24 -04:00			`# UTF-8`
			`'utf': 'utf_8',`
			`'utf8': 'utf_8',`
			`'u8': 'utf_8',`
Marc-Andre Lemburg <mal@lemburg.com>: Added some more codec aliases. Some of them are needed by the new locale.py encoding support. 2000-06-07 06:12:30 -03:00			`'utf8@ucs2': 'utf_8',`
			`'utf8@ucs4': 'utf_8',`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 19:17:24 -04:00
			`# UTF-16`
			`'utf16': 'utf_16',`
			`'u16': 'utf_16',`
			`'utf_16be': 'utf_16_be',`
			`'utf_16le': 'utf_16_le',`
Marc-Andre Lemburg: use all lowercase names. 2000-03-31 13:23:18 -04:00			`'unicodebigunmarked': 'utf_16_be',`
			`'unicodelittleunmarked': 'utf_16_le',`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 19:17:24 -04:00
			`# ASCII`
			`'us_ascii': 'ascii',`
Expose nl_langinfo through locale where available. 2001-08-10 10:58:50 -03:00			`'ansi_x3.4_1968': 'ascii', # used on Linux`
			`'646': 'ascii', # used on Solaris`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 19:17:24 -04:00
Patch #429957: Add support for cp1140, which is identical to cp037, with the addition of the euro character. Also added a few EDBDIC aliases. 2001-06-07 16:39:25 -03:00			`# EBCDIC`
			`'ebcdic_cp_us': 'cp037',`
			`'ibm039': 'cp037',`
			`'ibm1140': 'cp1140',`

Marc-Andre Lemburg: Unicode encodings. 2000-03-10 19:17:24 -04:00			`# ISO`
Marc-Andre Lemburg <mal@lemburg.com>: Added some more codec aliases. Some of them are needed by the new locale.py encoding support. 2000-06-07 06:12:30 -03:00			`'8859': 'latin_1',`
			`'iso8859': 'latin_1',`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 19:17:24 -04:00			`'iso8859_1': 'latin_1',`
			`'iso_8859_1': 'latin_1',`
			`'iso_8859_10': 'iso8859_10',`
			`'iso_8859_13': 'iso8859_13',`
			`'iso_8859_14': 'iso8859_14',`
			`'iso_8859_15': 'iso8859_15',`
			`'iso_8859_2': 'iso8859_2',`
			`'iso_8859_3': 'iso8859_3',`
			`'iso_8859_4': 'iso8859_4',`
			`'iso_8859_5': 'iso8859_5',`
			`'iso_8859_6': 'iso8859_6',`
			`'iso_8859_7': 'iso8859_7',`
			`'iso_8859_8': 'iso8859_8',`
			`'iso_8859_9': 'iso8859_9',`

			`# Mac`
Marc-Andre Lemburg <mal@lemburg.com>: Added some more codec aliases. Some of them are needed by the new locale.py encoding support. 2000-06-07 06:12:30 -03:00			`'maclatin2': 'mac_latin2',`
Marc-Andre Lemburg: use all lowercase names. 2000-03-31 13:23:18 -04:00			`'maccentraleurope': 'mac_latin2',`
			`'maccyrillic': 'mac_cyrillic',`
			`'macgreek': 'mac_greek',`
			`'maciceland': 'mac_iceland',`
			`'macroman': 'mac_roman',`
			`'macturkish': 'mac_turkish',`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 19:17:24 -04:00
Add some useful Windows encodings - patch #423221. 2001-06-03 23:31:23 -03:00			`# Windows`
Patch #487275: Add windows-1251 charset alias. 2001-12-02 08:24:19 -04:00			`'windows_1251': 'cp1251',`
Add some useful Windows encodings - patch #423221. 2001-06-03 23:31:23 -03:00			`'windows_1252': 'cp1252',`
			`'windows_1254': 'cp1254',`
			`'windows_1255': 'cp1255',`

Marc-Andre's third try at this bulk patch seems to work (except that his copy of test_contains.py seems to be broken -- the lines he deleted were already absent). Checkin messages: New Unicode support for int(), float(), complex() and long(). - new APIs PyInt_FromUnicode() and PyLong_FromUnicode() - added support for Unicode to PyFloat_FromString() - new encoding API PyUnicode_EncodeDecimal() which converts Unicode to a decimal char* string (used in the above new APIs) - shortcuts for calls like int(<int object>) and float(<float obj>) - tests for all of the above Unicode compares and contains checks: - comparing Unicode and non-string types now works; TypeErrors are masked, all other errors such as ValueError during Unicode coercion are passed through (note that PyUnicode_Compare does not implement the masking -- PyObject_Compare does this) - contains now works for non-string types too; TypeErrors are masked and 0 returned; all other errors are passed through Better testing support for the standard codecs. Misc minor enhancements, such as an alias dbcs for the mbcs codec. Changes: - PyLong_FromString() now applies the same error checks as does PyInt_FromString(): trailing garbage is reported as error and not longer silently ignored. The only characters which may be trailing the digits are 'L' and 'l' -- these are still silently ignored. - string.ato?() now directly interface to int(), long() and float(). The error strings are now a little different, but the type still remains the same. These functions are now ready to get declared obsolete ;-) - PyNumber_Int() now also does a check for embedded NULL chars in the input string; PyNumber_Long() already did this (and still does) Followed by: Looks like I've gone a step too far there... (and test_contains.py seem to have a bug too). I've changed back to reporting all errors in PyUnicode_Contains() and added a few more test cases to test_contains.py (plus corrected the join() NameError). 2000-04-05 17:11:21 -03:00			`# MBCS`
			`'dbcs': 'mbcs',`

Marc-Andre Lemburg <mal@lemburg.com>: Added some more codec aliases. Some of them are needed by the new locale.py encoding support. 2000-06-07 06:12:30 -03:00			`# Code pages`
			`'437': 'cp437',`

			`# CJK`
			`#`
			`# The codecs for these encodings are not distributed with the`
			`# Python core, but are included here for reference, since the`
			`# locale module relies on having these aliases available.`
			`#`
			`'jis_7': 'jis_7',`
			`'iso_2022_jp': 'jis_7',`
			`'ujis': 'euc_jp',`
			`'ajec': 'euc_jp',`
			`'eucjp': 'euc_jp',`
			`'tis260': 'tactis',`
			`'sjis': 'shift_jis',`

This patch changes the way the string .encode() method works slightly and introduces a new method .decode(). The major change is that strg.encode() will no longer try to convert Unicode returns from the codec into a string, but instead pass along the Unicode object as-is. The same is now true for all other codec return types. The underlying C APIs were changed accordingly. Note that even though this does have the potential of breaking existing code, the chances are low since conversion from Unicode previously took place using the default encoding which is normally set to ASCII rendering this auto-conversion mechanism useless for most Unicode encodings. The good news is that you can now use .encode() and .decode() with much greater ease and that the door was opened for better accessibility of the builtin codecs. As demonstration of the new feature, the patch includes a few new codecs which allow string to string encoding and decoding (rot13, hex, zip, uu, base64). Written by Marc-Andre Lemburg. Copyright assigned to the PSF. 2001-05-15 09:00:02 -03:00			`# Content transfer/compression encodings`
			`'rot13': 'rot_13',`
			`'base64': 'base64_codec',`
			`'base_64': 'base64_codec',`
			`'zlib': 'zlib_codec',`
			`'zip': 'zlib_codec',`
			`'hex': 'hex_codec',`
			`'uu': 'uu_codec',`
Add quoted-printable codec 2001-05-15 12:34:07 -03:00			`'quopri': 'quopri_codec',`
			`'quotedprintable': 'quopri_codec',`
			`'quoted_printable': 'quopri_codec',`
This patch changes the way the string .encode() method works slightly and introduces a new method .decode(). The major change is that strg.encode() will no longer try to convert Unicode returns from the codec into a string, but instead pass along the Unicode object as-is. The same is now true for all other codec return types. The underlying C APIs were changed accordingly. Note that even though this does have the potential of breaking existing code, the chances are low since conversion from Unicode previously took place using the default encoding which is normally set to ASCII rendering this auto-conversion mechanism useless for most Unicode encodings. The good news is that you can now use .encode() and .decode() with much greater ease and that the door was opened for better accessibility of the builtin codecs. As demonstration of the new feature, the patch includes a few new codecs which allow string to string encoding and decoding (rot13, hex, zip, uu, base64). Written by Marc-Andre Lemburg. Copyright assigned to the PSF. 2001-05-15 09:00:02 -03:00
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 19:17:24 -04:00			`}`