Commit Graph

131 Commits

Author SHA1 Message Date
Martin v. Löwis 43179c8e6f Add changelog entry. 2006-03-11 12:43:44 +00:00
Tim Peters 88ca467ca4 Whitespace normalization. 2006-03-10 23:39:56 +00:00
Martin v. Löwis 480f1bb67b Update Unicode database to Unicode 4.1. 2006-03-09 23:38:20 +00:00
Tim Peters 536cf99536 Whitespace normalization. 2005-12-25 23:18:31 +00:00
Marc-André Lemburg 68b49ef8a1 Add Makefile which allows easily rebuilding the charmap codecs. 2005-10-25 11:55:01 +00:00
Marc-André Lemburg 89bbfd4a36 Add custom mapping files used for generating some of the charmap
codecs.
2005-10-25 11:54:04 +00:00
Marc-André Lemburg bd20ea55bc Apply some cosmetic fixes to the output of the script.
Only include the decoding map if no table can be generated.
2005-10-25 11:53:33 +00:00
Marc-André Lemburg 92b201debc Add two new tools to compare codecs and show differences and to
list all installed codecs.
2005-10-21 13:47:03 +00:00
Marc-André Lemburg c5694c8bf4 Moved gencodec.py to the Tools/unicode/ directory.
Added new support for decoding tables.

Cleaned up the implementation a bit.
2005-10-21 13:45:17 +00:00
Hye-Shik Chang e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Tim Peters 182b5aca27 Whitespace normalization, via reindent.py. 2004-07-18 06:16:08 +00:00
Hye-Shik Chang 974ed7cfa5 - SF #962502: Add two more methods for unicode type; width() and
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Armin Rigo ba91b9fdda Applying SF patch #949329 on behalf of Raymond Hettinger. 2004-05-19 19:10:18 +00:00
Martin v. Löwis 2548c730c1 Implement IDNA (Internationalized Domain Names in Applications). 2003-04-18 10:39:54 +00:00
Martin v. Löwis b5c980b802 Add unidata_version. Bump generator version number. 2002-11-25 09:13:37 +00:00
Martin v. Löwis 97225da29a Sort names independent of the Python version. Fix hex constant warning.
Include all First/Last blocks.
2002-11-24 23:05:09 +00:00
Martin v. Löwis 677bde2dd1 Patch #626485: Support Unicode normalization. 2002-11-23 22:08:15 +00:00
Martin v. Löwis 99ac3283e7 Verify that lower-higher case delta are 16-bit. 2002-10-18 17:34:18 +00:00
Martin v. Löwis 9def6a3a77 Update to Unicode 3.2 database. 2002-10-18 16:11:54 +00:00
Walter Dörwald aaab30e00c Apply diff2.txt from SF patch http://www.python.org/sf/572113
(with one small bugfix in bgen/bgen/scantools.py)

This replaces string module functions with string methods
for the stuff in the Tools directory. Several uses of
string.letters etc. are still remaining.
2002-09-11 20:36:02 +00:00
Fredrik Lundh b2dfd73bdc Unicode nits: Don't include unicodedatabase.h no more. And make sure
to build *all* tables in makeunicodedata.py.
2001-01-21 23:31:52 +00:00
Fredrik Lundh 7b7dd107b3 compress unicode decomposition tables (this saves another 55k) 2001-01-21 22:41:08 +00:00
Fredrik Lundh 9e9bcda547 forgot to check in the new makeunicodedata.py script 2001-01-21 17:01:31 +00:00
Fredrik Lundh fad27aee11 Added 38,642 missing characters to the Unicode database (first-last
ranges) -- but thanks to the 2.0 compression scheme, this doesn't add
a single byte to the resulting binaries (!)

Closes bug #117524
2000-11-03 20:24:15 +00:00
Fred Drake 9c6850510c Remove bogus stdout redirection and use of sys.__stdout__; use
augmented print statement instead.
2000-10-26 03:56:46 +00:00
Fredrik Lundh 375732cd41 - don't set the titlecase flag for uppercase letters (sorry, tim) 2000-09-25 23:03:34 +00:00
Fredrik Lundh 0f8fad4969 unicode database compression, step 3:
- added decimal digit and digit properties to the unidb tables
2000-09-25 21:01:56 +00:00
Fredrik Lundh e9133f7e2e unicode database compression, step 3:
- use unidb compression for the unicodectype module.  smaller,
  faster, and slightly more portable...

- also mention the unicode directory in Tools/README
2000-09-25 17:59:57 +00:00
Fredrik Lundh cfcea49218 unicode database compression, step 2:
- fixed attributions
- moved decomposition data to a separate table, in preparation
  for step 3 (which won't happen before 2.0 final, promise!)
- use relative paths in the generator script

I have a lot more stuff in the works for 2.1, but let's leave
that for another day...
2000-09-25 08:07:06 +00:00
Tim Peters 2101348830 Fiddled w/ /F's cool new splitbins function: documented it, generalized it
a bit, sped it a lot primarily by removing the unused assumption that None was
a legit bin entry (the function doesn't really need to assume that there's
anything special about 0), added an optional "trace" argument, and in __debug__
mode added exhaustive verification that the decomposition is both correct and
doesn't overstep any array bounds (which wasn't obvious to me from staring at the
generated C code -- now I feel safe!).  Did not commit a new unicodedata_db.h, as
the one produced by this version is identical to the one already checked in.
2000-09-25 07:13:41 +00:00
Fredrik Lundh f367cacb98 unicode database compression, step 1:
- use unidb compression for the unicodedata module.  on Windows,
  the new unidatabase module is 120k, down from nearly 600k.
2000-09-24 23:18:31 +00:00