cpython/Modules/cjkcodecs
Benjamin Peterson be2c0a9fe3 Merged revisions 66766-66767,66771-66772,66774,66776,66783-66787,66790,66793,66797 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

................
  r66766 | benjamin.peterson | 2008-10-03 06:52:06 -0500 (Fri, 03 Oct 2008) | 1 line

  update the mac installer script
................
  r66767 | andrew.kuchling | 2008-10-03 07:26:42 -0500 (Fri, 03 Oct 2008) | 1 line

  Docstring typo.
................
  r66771 | hirokazu.yamamoto | 2008-10-03 11:18:42 -0500 (Fri, 03 Oct 2008) | 2 lines

  Fixed following error when DocXMLRPCServer failed.
    UnboundLocalError: local variable 'serv' referenced before assignment
................
  r66772 | andrew.kuchling | 2008-10-03 11:29:19 -0500 (Fri, 03 Oct 2008) | 1 line

  Mention exception in docstring
................
  r66774 | andrew.kuchling | 2008-10-03 11:42:52 -0500 (Fri, 03 Oct 2008) | 1 line

  Typo fix
................
  r66776 | hirokazu.yamamoto | 2008-10-03 12:34:49 -0500 (Fri, 03 Oct 2008) | 2 lines

  Issue #1706863: Fixed "'NoneType' object has no attribute 'rfind'" error when sqlite libfile not found.
................
  r66783 | andrew.kuchling | 2008-10-03 20:02:29 -0500 (Fri, 03 Oct 2008) | 1 line

  Use correct capitalization of NaN
................
  r66784 | andrew.kuchling | 2008-10-03 20:03:42 -0500 (Fri, 03 Oct 2008) | 1 line

  Docstring change: Specify exception raised
................
  r66785 | andrew.kuchling | 2008-10-03 20:04:24 -0500 (Fri, 03 Oct 2008) | 1 line

  Docstring changes: Specify exceptions raised
................
  r66786 | andrew.kuchling | 2008-10-03 20:05:56 -0500 (Fri, 03 Oct 2008) | 3 lines

  Docstring change for *partition: use same tense as other docstrings.
  Hyphenate left- and right-justified.
  Fix 'registerd' typo
................
  r66787 | andrew.kuchling | 2008-10-03 22:08:56 -0500 (Fri, 03 Oct 2008) | 1 line

  two corrections
................
  r66790 | andrew.kuchling | 2008-10-04 11:52:01 -0500 (Sat, 04 Oct 2008) | 1 line

  Set svn:keywords
................
  r66793 | georg.brandl | 2008-10-04 13:26:01 -0500 (Sat, 04 Oct 2008) | 2 lines

  #4041: don't refer to removed and outdated modules.
................
  r66797 | benjamin.peterson | 2008-10-04 15:55:50 -0500 (Sat, 04 Oct 2008) | 19 lines

  Merged revisions 66707,66775,66782 via svnmerge from
  svn+ssh://pythondev@svn.python.org/sandbox/trunk/2to3/lib2to3

  ........
    r66707 | benjamin.peterson | 2008-09-30 18:27:10 -0500 (Tue, 30 Sep 2008) | 1 line

    fix #4001: fix_imports didn't check for __init__.py before converting to relative imports
  ........
    r66775 | collin.winter | 2008-10-03 12:08:26 -0500 (Fri, 03 Oct 2008) | 4 lines

    Add an alternative iterative pattern matching system that, while slower, correctly parses files that cause the faster recursive pattern matcher to fail with a recursion error. lib2to3 falls back to the iterative matcher if the recursive one fails.

    Fixes http://bugs.python.org/issue2532. Thanks to Nick Edds.
  ........
    r66782 | benjamin.peterson | 2008-10-03 17:51:36 -0500 (Fri, 03 Oct 2008) | 1 line

    add Victor Stinner's fixer for os.getcwdu -> os.getcwd #4023
  ........
................
2008-10-04 21:33:08 +00:00
..
README
_codecs_cn.c Fix gb18030 codec's bug that doesn't map two-byte characters on 2007-08-04 04:10:18 +00:00
_codecs_hk.c Update big5hkscs codec to conform to the HKSCS:2004 revision. 2008-02-08 17:10:20 +00:00
_codecs_iso2022.c Fixed a warning in _codecs_iso2022.c and some non C89 conform // comments. 2007-12-14 03:02:34 +00:00
_codecs_jp.c
_codecs_kr.c Add cheot-ga-keut composed make-up sequence support in EUC-KR codec. 2007-08-20 06:49:18 +00:00
_codecs_tw.c
alg_jisx0201.h
cjkcodecs.h This reverts r63675 based on the discussion in this thread: 2008-06-09 04:58:54 +00:00
emu_jisx0213_2000.h
mappings_cn.h
mappings_hk.h Update big5hkscs codec to conform to the HKSCS:2004 revision. 2008-02-08 17:10:20 +00:00
mappings_jisx0213_pair.h
mappings_jp.h
mappings_kr.h
mappings_tw.h
multibytecodec.c Merged revisions 66766-66767,66771-66772,66774,66776,66783-66787,66790,66793,66797 via svnmerge from 2008-10-04 21:33:08 +00:00
multibytecodec.h

README

To generate or modify mapping headers
-------------------------------------
Mapping headers are imported from CJKCodecs as pre-generated form.
If you need to tweak or add something on it, please look at tools/
subdirectory of CJKCodecs' distribution.



Notes on implmentation characteristics of each codecs
-----------------------------------------------------

1) Big5 codec

  The big5 codec maps the following characters as cp950 does rather
  than conforming Unicode.org's that maps to 0xFFFD.

    BIG5        Unicode     Description

    0xA15A      0x2574      SPACING UNDERSCORE
    0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE
    0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE
    0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT
    0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT
    0xA2CC      0x5341      HANGZHOU NUMERAL TEN
    0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY

  Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
  big5 codes already, a roundtrip compatibility is not guaranteed for
  them.


2) cp932 codec

  To conform to Windows's real mapping, cp932 codec maps the following
  codepoints in addition of the official cp932 mapping.

    CP932     Unicode     Description

    0x80      0x80        UNDEFINED
    0xA0      0xF8F0      UNDEFINED
    0xFD      0xF8F1      UNDEFINED
    0xFE      0xF8F2      UNDEFINED
    0xFF      0xF8F3      UNDEFINED


3) euc-jisx0213 codec

  The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
  unicode U+FF3C instead of U+005C as on unicode.org's mapping.
  Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
  is shown as a full width character, mapping to U+FF3C can make
  more sense.

  The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
  codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
  overlapped by each other, it doesn't bother standard conformations
  (and JIS X 0213 Plane 2 is intended to use so.) On encoding
  sessions, the codec will try to encode kanji characters in this
  order:

    JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212


4) euc-jp codec

  The euc-jp codec is a compatibility instance on these points:
   - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
   - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
   - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)


5) shift-jis codec

  The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
  instead of using JIS X 0201 for compatibility. The differences are:
   - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
   - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
   - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.