cpython/Modules/cjkcodecs
Guido van Rossum 806c2469cb Merged revisions 56753-56781 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/p3yk

................
  r56760 | neal.norwitz | 2007-08-05 18:55:39 -0700 (Sun, 05 Aug 2007) | 178 lines

  Merged revisions 56477-56759 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r56485 | facundo.batista | 2007-07-21 17:13:00 -0700 (Sat, 21 Jul 2007) | 5 lines


    Selectively enable tests for asyncore.readwrite based on the presence
    of poll support in the select module (since this is the only case in
    which readwrite can be called). [GSoC - Alan McIntyre]
  ........
    r56488 | nick.coghlan | 2007-07-22 03:18:07 -0700 (Sun, 22 Jul 2007) | 1 line

    Add explicit relative import tests for runpy.run_module
  ........
    r56509 | nick.coghlan | 2007-07-23 06:41:45 -0700 (Mon, 23 Jul 2007) | 5 lines

    Correctly cleanup sys.modules after executing runpy relative import
    tests
    Restore Python 2.4 ImportError when attempting to execute a package
    (as imports cannot be guaranteed to work properly if you try it)
  ........
    r56519 | nick.coghlan | 2007-07-24 06:07:38 -0700 (Tue, 24 Jul 2007) | 1 line

    Tweak runpy test to do a better job of confirming that sys has been manipulated correctly
  ........
    r56520 | nick.coghlan | 2007-07-24 06:58:28 -0700 (Tue, 24 Jul 2007) | 1 line

    Fix an incompatibility between the -i and -m command line switches as reported on python-dev by PJE - runpy.run_module now leaves any changes it makes to the sys module intact after the function terminates
  ........
    r56523 | nick.coghlan | 2007-07-24 07:39:23 -0700 (Tue, 24 Jul 2007) | 1 line

    Try to get rid of spurious failure in test_resource on the Debian buildbots by changing the file size limit before attempting to close the file
  ........
    r56533 | facundo.batista | 2007-07-24 14:20:42 -0700 (Tue, 24 Jul 2007) | 7 lines


    New tests for basic behavior of smtplib.SMTP and
    smtpd.DebuggingServer. Change to use global host & port number
    variables. Modified the 'server' to take a string to send back in
    order to vary test server responses. Added a test for the reaction of
    smtplib.SMTP to a non-200 HELO response. [GSoC - Alan McIntyre]
  ........
    r56538 | nick.coghlan | 2007-07-25 05:57:48 -0700 (Wed, 25 Jul 2007) | 1 line

    More buildbot cleanup - let the OS assign the port for test_urllib2_localnet
  ........
    r56539 | nick.coghlan | 2007-07-25 06:18:58 -0700 (Wed, 25 Jul 2007) | 1 line

    Add a temporary diagnostic message before a strange failure on the alpha Debian buildbot
  ........
    r56543 | martin.v.loewis | 2007-07-25 09:24:23 -0700 (Wed, 25 Jul 2007) | 2 lines

    Change location of the package index to pypi.python.org/pypi
  ........
    r56551 | georg.brandl | 2007-07-26 02:36:25 -0700 (Thu, 26 Jul 2007) | 2 lines

    tabs, newlines and crs are valid XML characters.
  ........
    r56553 | nick.coghlan | 2007-07-26 07:03:00 -0700 (Thu, 26 Jul 2007) | 1 line

    Add explicit test for a misbehaving math.floor
  ........
    r56561 | mark.hammond | 2007-07-26 21:52:32 -0700 (Thu, 26 Jul 2007) | 3 lines

    In consultation with Kristjan Jonsson, only define WINVER and _WINNT_WIN32
    if (a) we are building Python itself and (b) no one previously defined them
  ........
    r56562 | mark.hammond | 2007-07-26 22:08:54 -0700 (Thu, 26 Jul 2007) | 2 lines

    Correctly detect AMD64 architecture on VC2003
  ........
    r56566 | nick.coghlan | 2007-07-27 03:36:30 -0700 (Fri, 27 Jul 2007) | 1 line

    Make test_math error messages more meaningful for small discrepancies in results
  ........
    r56588 | martin.v.loewis | 2007-07-27 11:28:22 -0700 (Fri, 27 Jul 2007) | 2 lines

    Bug #978833: Close https sockets by releasing the _ssl object.
  ........
    r56601 | martin.v.loewis | 2007-07-28 00:03:05 -0700 (Sat, 28 Jul 2007) | 3 lines

    Bug #1704793: Return UTF-16 pair if unicodedata.lookup cannot
    represent the result in a single character.
  ........
    r56604 | facundo.batista | 2007-07-28 07:21:22 -0700 (Sat, 28 Jul 2007) | 9 lines


    Moved all of the capture_server socket setup code into the try block
    so that the event gets set if a failure occurs during server setup
    (otherwise the test will block forever).  Changed to let the OS assign
    the server port number, and client side of test waits for port number
    assignment before proceeding. The test data in DispatcherWithSendTests
    is also sent in multiple send() calls instead of one to make sure this
    works properly. [GSoC - Alan McIntyre]
  ........
    r56611 | georg.brandl | 2007-07-29 01:26:10 -0700 (Sun, 29 Jul 2007) | 2 lines

    Clarify PEP 343 description.
  ........
    r56614 | georg.brandl | 2007-07-29 02:11:15 -0700 (Sun, 29 Jul 2007) | 2 lines

    try-except-finally is new in 2.5.
  ........
    r56617 | facundo.batista | 2007-07-29 07:23:08 -0700 (Sun, 29 Jul 2007) | 9 lines


    Added tests for asynchat classes simple_producer & fifo, and the
    find_prefix_at_end function. Check behavior of a string given as a
    producer.  Added tests for behavior of asynchat.async_chat when given
    int, long, and None terminator arguments. Added usepoll attribute to
    TestAsynchat to allow running the asynchat tests with poll support
    chosen whether it's available or not (improves coverage of asyncore
    code). [GSoC - Alan McIntyre]
  ........
    r56620 | georg.brandl | 2007-07-29 10:38:35 -0700 (Sun, 29 Jul 2007) | 2 lines

    Bug #1763149: use proper slice syntax in docstring.
     (backport)
  ........
    r56624 | mark.hammond | 2007-07-29 17:45:29 -0700 (Sun, 29 Jul 2007) | 4 lines

    Correct use of Py_BUILD_CORE - now make sure it is defined before it is
    referenced, and also fix definition of _WIN32_WINNT.
    Resolves patch 1761803.
  ........
    r56632 | facundo.batista | 2007-07-30 20:03:34 -0700 (Mon, 30 Jul 2007) | 8 lines


    When running asynchat tests on OS X (darwin), the test client now
    overrides asyncore.dispatcher.handle_expt to do nothing, since
    select.poll gives a POLLHUP error at the completion of these tests.
    Added timeout & count arguments to several asyncore.loop calls to
    avoid the possibility of a test hanging up a build. [GSoC - Alan
    McIntyre]
  ........
    r56633 | nick.coghlan | 2007-07-31 06:38:01 -0700 (Tue, 31 Jul 2007) | 1 line

    Eliminate RLock race condition reported in SF bug #1764059
  ........
    r56636 | martin.v.loewis | 2007-07-31 12:57:56 -0700 (Tue, 31 Jul 2007) | 2 lines

    Define _BSD_SOURCE, to get access to POSIX extensions on OpenBSD 4.1+.
  ........
    r56653 | facundo.batista | 2007-08-01 16:18:36 -0700 (Wed, 01 Aug 2007) | 9 lines


    Allow the OS to select a free port for each test server. For
    DebuggingServerTests, construct SMTP objects with a localhost argument
    to avoid abysmally long FQDN lookups (not relevant to items under
    test) on some machines that would cause the test to fail. Moved server
    setup code in the server function inside the try block to avoid the
    possibility of setup failure hanging the test.  Minor edits to conform
    to PEP 8. [GSoC - Alan McIntyre]
  ........
    r56681 | matthias.klose | 2007-08-02 14:33:13 -0700 (Thu, 02 Aug 2007) | 2 lines

    - Allow Emacs 22 for building the documentation in info format.
  ........
    r56689 | neal.norwitz | 2007-08-02 23:46:29 -0700 (Thu, 02 Aug 2007) | 1 line

    Py_ssize_t is defined regardless of HAVE_LONG_LONG.  Will backport
  ........
    r56727 | hyeshik.chang | 2007-08-03 21:10:18 -0700 (Fri, 03 Aug 2007) | 3 lines

    Fix gb18030 codec's bug that doesn't map two-byte characters on
    GB18030 extension in encoding. (bug reported by Bjorn Stabell)
  ........
    r56751 | neal.norwitz | 2007-08-04 20:23:31 -0700 (Sat, 04 Aug 2007) | 7 lines

    Handle errors when generating a warning.
    The value is always written to the returned pointer if getting it was
    successful, even if a warning causes an error. (This probably doesn't matter
    as the caller will probably discard the value.)

    Will backport.
  ........
................
2007-08-06 23:33:07 +00:00
..
README - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
_codecs_cn.c Merged revisions 56753-56781 via svnmerge from 2007-08-06 23:33:07 +00:00
_codecs_hk.c - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
_codecs_iso2022.c Four months of trunk changes (including a few releases...) 2006-12-13 04:49:30 +00:00
_codecs_jp.c Merge the rest of the trunk. 2006-06-08 15:35:45 +00:00
_codecs_kr.c - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
_codecs_tw.c - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
alg_jisx0201.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
cjkcodecs.h Allow encoding names to be unicode strings. 2007-05-17 18:56:39 +00:00
emu_jisx0213_2000.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_cn.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_hk.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_jisx0213_pair.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_jp.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_kr.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
mappings_tw.h - Modernize code to use Py_ssize_t more intensively. 2006-03-04 16:08:19 +00:00
multibytecodec.c Merged revisions 56467-56482 via svnmerge from 2007-07-21 17:22:18 +00:00
multibytecodec.h Merge p3yk branch with the trunk up to revision 45595. This breaks a fair 2006-04-21 10:40:58 +00:00

README

To generate or modify mapping headers
-------------------------------------
Mapping headers are imported from CJKCodecs as pre-generated form.
If you need to tweak or add something on it, please look at tools/
subdirectory of CJKCodecs' distribution.



Notes on implmentation characteristics of each codecs
-----------------------------------------------------

1) Big5 codec

  The big5 codec maps the following characters as cp950 does rather
  than conforming Unicode.org's that maps to 0xFFFD.

    BIG5        Unicode     Description

    0xA15A      0x2574      SPACING UNDERSCORE
    0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE
    0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE
    0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT
    0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT
    0xA2CC      0x5341      HANGZHOU NUMERAL TEN
    0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY

  Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
  big5 codes already, a roundtrip compatibility is not guaranteed for
  them.


2) cp932 codec

  To conform to Windows's real mapping, cp932 codec maps the following
  codepoints in addition of the official cp932 mapping.

    CP932     Unicode     Description

    0x80      0x80        UNDEFINED
    0xA0      0xF8F0      UNDEFINED
    0xFD      0xF8F1      UNDEFINED
    0xFE      0xF8F2      UNDEFINED
    0xFF      0xF8F3      UNDEFINED


3) euc-jisx0213 codec

  The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
  unicode U+FF3C instead of U+005C as on unicode.org's mapping.
  Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
  is shown as a full width character, mapping to U+FF3C can make
  more sense.

  The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
  codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
  overlapped by each other, it doesn't bother standard conformations
  (and JIS X 0213 Plane 2 is intended to use so.) On encoding
  sessions, the codec will try to encode kanji characters in this
  order:

    JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212


4) euc-jp codec

  The euc-jp codec is a compatibility instance on these points:
   - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
   - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
   - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)


5) shift-jis codec

  The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
  instead of using JIS X 0201 for compatibility. The differences are:
   - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
   - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
   - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.