Commit Graph

48 Commits

Author SHA1 Message Date
Florent Xicluna 2e0a53fdf6 Issue #8024: Update the Unicode database to 5.2 2010-03-18 21:50:06 +00:00
Florent Xicluna dc36472472 Remove py3k deprecation warnings from these Unicode tools. 2010-03-15 14:00:58 +00:00
Benjamin Peterson f4803aa623 set svn:eol-style on various files 2010-03-08 22:15:11 +00:00
Amaury Forgeot d'Arc 5c92d4301d #7112: Fix compilation warning in unicodetype_db.h
makeunicodedata now generates double literals
2009-10-13 21:29:34 +00:00
Amaury Forgeot d'Arc d0052d17b1 #1571184: makeunicodedata.py now generates the functions _PyUnicode_ToNumeric,
_PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace.

It now also parses the Unihan.txt for numeric values.
2009-10-06 19:56:32 +00:00
Amaury Forgeot d'Arc 70dda76cde #1616979: Add the cp720 (Arabic DOS) encoding.
Since there is no official mapping file from unicode.org,
the codec file is generated on Windows with the new genwincodec.py script.
2009-07-13 20:01:11 +00:00
Antoine Pitrou e988e286b2 Issue #1734234: Massively speedup `unicodedata.normalize()` when the
string is already in normalized form, by performing a quick check beforehand.
Original patch by Rauli Ruohonen.
2009-04-27 21:53:26 +00:00
Walter Dörwald 5d98ec76bb Issue #5828 (Invalid behavior of unicode.lower): Fixed bogus logic in
makeunicodedata.py and regenerated the Unicode database (This fixes
u'\u1d79'.lower() == '\x00').
2009-04-25 14:03:16 +00:00
Martin v. Löwis 24329ba176 Issue #3811: The Unicode database was updated to 5.1.
Reviewed by Fredrik Lundh and Marc-Andre Lemburg.
2008-09-10 13:38:12 +00:00
Martin v. Löwis 111c180674 Make more symbols static. 2008-06-13 07:47:47 +00:00
Christian Heimes c5f05e45cf Patch #2167 from calvin: Remove unused imports 2008-02-23 17:40:11 +00:00
Martin v. Löwis 3f767795f6 Patch #1359618: Speed-up charmap encoder. 2006-06-04 19:36:28 +00:00
Jack Diederich df676c5ffd when generating python code prefer to generate valid python code 2006-05-26 11:37:20 +00:00
Walter Dörwald 5d23f9a8a3 Don't add multiple empty lines at the end of the codec. With this a
regenerated codec should survive reindent.py unchanged.
2006-03-31 10:13:10 +00:00
Walter Dörwald cff22083f1 Whitespace for generated code. 2006-03-27 15:11:56 +00:00
Hye-Shik Chang e2ac4abd01 Patch #1443155: Add the incremental codecs support for CJK codecs.
(reviewed by Walter Dörwald)
2006-03-26 02:34:59 +00:00
Walter Dörwald abb02e5994 Patch #1436130: codecs.lookup() now returns a CodecInfo object (a subclass
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Martin v. Löwis 43179c8e6f Add changelog entry. 2006-03-11 12:43:44 +00:00
Tim Peters 88ca467ca4 Whitespace normalization. 2006-03-10 23:39:56 +00:00
Martin v. Löwis 480f1bb67b Update Unicode database to Unicode 4.1. 2006-03-09 23:38:20 +00:00
Tim Peters 536cf99536 Whitespace normalization. 2005-12-25 23:18:31 +00:00
Marc-André Lemburg 68b49ef8a1 Add Makefile which allows easily rebuilding the charmap codecs. 2005-10-25 11:55:01 +00:00
Marc-André Lemburg 89bbfd4a36 Add custom mapping files used for generating some of the charmap
codecs.
2005-10-25 11:54:04 +00:00
Marc-André Lemburg bd20ea55bc Apply some cosmetic fixes to the output of the script.
Only include the decoding map if no table can be generated.
2005-10-25 11:53:33 +00:00
Marc-André Lemburg 92b201debc Add two new tools to compare codecs and show differences and to
list all installed codecs.
2005-10-21 13:47:03 +00:00
Marc-André Lemburg c5694c8bf4 Moved gencodec.py to the Tools/unicode/ directory.
Added new support for decoding tables.

Cleaned up the implementation a bit.
2005-10-21 13:45:17 +00:00
Hye-Shik Chang e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Tim Peters 182b5aca27 Whitespace normalization, via reindent.py. 2004-07-18 06:16:08 +00:00
Hye-Shik Chang 974ed7cfa5 - SF #962502: Add two more methods for unicode type; width() and
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Armin Rigo ba91b9fdda Applying SF patch #949329 on behalf of Raymond Hettinger. 2004-05-19 19:10:18 +00:00
Martin v. Löwis 2548c730c1 Implement IDNA (Internationalized Domain Names in Applications). 2003-04-18 10:39:54 +00:00
Martin v. Löwis b5c980b802 Add unidata_version. Bump generator version number. 2002-11-25 09:13:37 +00:00
Martin v. Löwis 97225da29a Sort names independent of the Python version. Fix hex constant warning.
Include all First/Last blocks.
2002-11-24 23:05:09 +00:00
Martin v. Löwis 677bde2dd1 Patch #626485: Support Unicode normalization. 2002-11-23 22:08:15 +00:00
Martin v. Löwis 99ac3283e7 Verify that lower-higher case delta are 16-bit. 2002-10-18 17:34:18 +00:00
Martin v. Löwis 9def6a3a77 Update to Unicode 3.2 database. 2002-10-18 16:11:54 +00:00
Walter Dörwald aaab30e00c Apply diff2.txt from SF patch http://www.python.org/sf/572113
(with one small bugfix in bgen/bgen/scantools.py)

This replaces string module functions with string methods
for the stuff in the Tools directory. Several uses of
string.letters etc. are still remaining.
2002-09-11 20:36:02 +00:00
Fredrik Lundh b2dfd73bdc Unicode nits: Don't include unicodedatabase.h no more. And make sure
to build *all* tables in makeunicodedata.py.
2001-01-21 23:31:52 +00:00
Fredrik Lundh 7b7dd107b3 compress unicode decomposition tables (this saves another 55k) 2001-01-21 22:41:08 +00:00
Fredrik Lundh 9e9bcda547 forgot to check in the new makeunicodedata.py script 2001-01-21 17:01:31 +00:00
Fredrik Lundh fad27aee11 Added 38,642 missing characters to the Unicode database (first-last
ranges) -- but thanks to the 2.0 compression scheme, this doesn't add
a single byte to the resulting binaries (!)

Closes bug #117524
2000-11-03 20:24:15 +00:00
Fred Drake 9c6850510c Remove bogus stdout redirection and use of sys.__stdout__; use
augmented print statement instead.
2000-10-26 03:56:46 +00:00
Fredrik Lundh 375732cd41 - don't set the titlecase flag for uppercase letters (sorry, tim) 2000-09-25 23:03:34 +00:00
Fredrik Lundh 0f8fad4969 unicode database compression, step 3:
- added decimal digit and digit properties to the unidb tables
2000-09-25 21:01:56 +00:00
Fredrik Lundh e9133f7e2e unicode database compression, step 3:
- use unidb compression for the unicodectype module.  smaller,
  faster, and slightly more portable...

- also mention the unicode directory in Tools/README
2000-09-25 17:59:57 +00:00
Fredrik Lundh cfcea49218 unicode database compression, step 2:
- fixed attributions
- moved decomposition data to a separate table, in preparation
  for step 3 (which won't happen before 2.0 final, promise!)
- use relative paths in the generator script

I have a lot more stuff in the works for 2.1, but let's leave
that for another day...
2000-09-25 08:07:06 +00:00
Tim Peters 2101348830 Fiddled w/ /F's cool new splitbins function: documented it, generalized it
a bit, sped it a lot primarily by removing the unused assumption that None was
a legit bin entry (the function doesn't really need to assume that there's
anything special about 0), added an optional "trace" argument, and in __debug__
mode added exhaustive verification that the decomposition is both correct and
doesn't overstep any array bounds (which wasn't obvious to me from staring at the
generated C code -- now I feel safe!).  Did not commit a new unicodedata_db.h, as
the one produced by this version is identical to the one already checked in.
2000-09-25 07:13:41 +00:00
Fredrik Lundh f367cacb98 unicode database compression, step 1:
- use unidb compression for the unicodedata module.  on Windows,
  the new unidatabase module is 120k, down from nearly 600k.
2000-09-24 23:18:31 +00:00