Florent Xicluna
22b243809e
#7643 : Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14 .
2010-03-30 08:24:06 +00:00
Florent Xicluna
2e0a53fdf6
Issue #8024 : Update the Unicode database to 5.2
2010-03-18 21:50:06 +00:00
Florent Xicluna
dc36472472
Remove py3k deprecation warnings from these Unicode tools.
2010-03-15 14:00:58 +00:00
Benjamin Peterson
f4803aa623
set svn:eol-style on various files
2010-03-08 22:15:11 +00:00
Amaury Forgeot d'Arc
5c92d4301d
#7112 : Fix compilation warning in unicodetype_db.h
...
makeunicodedata now generates double literals
2009-10-13 21:29:34 +00:00
Amaury Forgeot d'Arc
d0052d17b1
#1571184 : makeunicodedata.py now generates the functions _PyUnicode_ToNumeric,
...
_PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace.
It now also parses the Unihan.txt for numeric values.
2009-10-06 19:56:32 +00:00
Amaury Forgeot d'Arc
70dda76cde
#1616979 : Add the cp720 (Arabic DOS) encoding.
...
Since there is no official mapping file from unicode.org,
the codec file is generated on Windows with the new genwincodec.py script.
2009-07-13 20:01:11 +00:00
Antoine Pitrou
e988e286b2
Issue #1734234 : Massively speedup `unicodedata.normalize()` when the
...
string is already in normalized form, by performing a quick check beforehand.
Original patch by Rauli Ruohonen.
2009-04-27 21:53:26 +00:00
Walter Dörwald
5d98ec76bb
Issue #5828 (Invalid behavior of unicode.lower): Fixed bogus logic in
...
makeunicodedata.py and regenerated the Unicode database (This fixes
u'\u1d79'.lower() == '\x00').
2009-04-25 14:03:16 +00:00
Martin v. Löwis
24329ba176
Issue #3811 : The Unicode database was updated to 5.1.
...
Reviewed by Fredrik Lundh and Marc-Andre Lemburg.
2008-09-10 13:38:12 +00:00
Martin v. Löwis
111c180674
Make more symbols static.
2008-06-13 07:47:47 +00:00
Christian Heimes
c5f05e45cf
Patch #2167 from calvin: Remove unused imports
2008-02-23 17:40:11 +00:00
Martin v. Löwis
3f767795f6
Patch #1359618 : Speed-up charmap encoder.
2006-06-04 19:36:28 +00:00
Jack Diederich
df676c5ffd
when generating python code prefer to generate valid python code
2006-05-26 11:37:20 +00:00
Walter Dörwald
5d23f9a8a3
Don't add multiple empty lines at the end of the codec. With this a
...
regenerated codec should survive reindent.py unchanged.
2006-03-31 10:13:10 +00:00
Walter Dörwald
cff22083f1
Whitespace for generated code.
2006-03-27 15:11:56 +00:00
Hye-Shik Chang
e2ac4abd01
Patch #1443155 : Add the incremental codecs support for CJK codecs.
...
(reviewed by Walter Dörwald)
2006-03-26 02:34:59 +00:00
Walter Dörwald
abb02e5994
Patch #1436130 : codecs.lookup() now returns a CodecInfo object (a subclass
...
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Martin v. Löwis
43179c8e6f
Add changelog entry.
2006-03-11 12:43:44 +00:00
Tim Peters
88ca467ca4
Whitespace normalization.
2006-03-10 23:39:56 +00:00
Martin v. Löwis
480f1bb67b
Update Unicode database to Unicode 4.1.
2006-03-09 23:38:20 +00:00
Tim Peters
536cf99536
Whitespace normalization.
2005-12-25 23:18:31 +00:00
Marc-André Lemburg
68b49ef8a1
Add Makefile which allows easily rebuilding the charmap codecs.
2005-10-25 11:55:01 +00:00
Marc-André Lemburg
89bbfd4a36
Add custom mapping files used for generating some of the charmap
...
codecs.
2005-10-25 11:54:04 +00:00
Marc-André Lemburg
bd20ea55bc
Apply some cosmetic fixes to the output of the script.
...
Only include the decoding map if no table can be generated.
2005-10-25 11:53:33 +00:00
Marc-André Lemburg
92b201debc
Add two new tools to compare codecs and show differences and to
...
list all installed codecs.
2005-10-21 13:47:03 +00:00
Marc-André Lemburg
c5694c8bf4
Moved gencodec.py to the Tools/unicode/ directory.
...
Added new support for decoding tables.
Cleaned up the implementation a bit.
2005-10-21 13:45:17 +00:00
Hye-Shik Chang
e9ddfbb412
SF #989185 : Drop unicode.iswide() and unicode.width() and add
...
unicodedata.east_asian_width(). You can still implement your own
simple width() function using it like this:
def width(u):
w = 0
for c in unicodedata.normalize('NFC', u):
cwidth = unicodedata.east_asian_width(c)
if cwidth in ('W', 'F'): w += 2
else: w += 1
return w
2004-08-04 07:38:35 +00:00
Tim Peters
182b5aca27
Whitespace normalization, via reindent.py.
2004-07-18 06:16:08 +00:00
Hye-Shik Chang
974ed7cfa5
- SF #962502 : Add two more methods for unicode type; width() and
...
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Armin Rigo
ba91b9fdda
Applying SF patch #949329 on behalf of Raymond Hettinger.
2004-05-19 19:10:18 +00:00
Martin v. Löwis
2548c730c1
Implement IDNA (Internationalized Domain Names in Applications).
2003-04-18 10:39:54 +00:00
Martin v. Löwis
b5c980b802
Add unidata_version. Bump generator version number.
2002-11-25 09:13:37 +00:00
Martin v. Löwis
97225da29a
Sort names independent of the Python version. Fix hex constant warning.
...
Include all First/Last blocks.
2002-11-24 23:05:09 +00:00
Martin v. Löwis
677bde2dd1
Patch #626485 : Support Unicode normalization.
2002-11-23 22:08:15 +00:00
Martin v. Löwis
99ac3283e7
Verify that lower-higher case delta are 16-bit.
2002-10-18 17:34:18 +00:00
Martin v. Löwis
9def6a3a77
Update to Unicode 3.2 database.
2002-10-18 16:11:54 +00:00
Walter Dörwald
aaab30e00c
Apply diff2.txt from SF patch http://www.python.org/sf/572113
...
(with one small bugfix in bgen/bgen/scantools.py)
This replaces string module functions with string methods
for the stuff in the Tools directory. Several uses of
string.letters etc. are still remaining.
2002-09-11 20:36:02 +00:00
Fredrik Lundh
b2dfd73bdc
Unicode nits: Don't include unicodedatabase.h no more. And make sure
...
to build *all* tables in makeunicodedata.py.
2001-01-21 23:31:52 +00:00
Fredrik Lundh
7b7dd107b3
compress unicode decomposition tables (this saves another 55k)
2001-01-21 22:41:08 +00:00
Fredrik Lundh
9e9bcda547
forgot to check in the new makeunicodedata.py script
2001-01-21 17:01:31 +00:00
Fredrik Lundh
fad27aee11
Added 38,642 missing characters to the Unicode database (first-last
...
ranges) -- but thanks to the 2.0 compression scheme, this doesn't add
a single byte to the resulting binaries (!)
Closes bug #117524
2000-11-03 20:24:15 +00:00
Fred Drake
9c6850510c
Remove bogus stdout redirection and use of sys.__stdout__; use
...
augmented print statement instead.
2000-10-26 03:56:46 +00:00
Fredrik Lundh
375732cd41
- don't set the titlecase flag for uppercase letters (sorry, tim)
2000-09-25 23:03:34 +00:00
Fredrik Lundh
0f8fad4969
unicode database compression, step 3:
...
- added decimal digit and digit properties to the unidb tables
2000-09-25 21:01:56 +00:00
Fredrik Lundh
e9133f7e2e
unicode database compression, step 3:
...
- use unidb compression for the unicodectype module. smaller,
faster, and slightly more portable...
- also mention the unicode directory in Tools/README
2000-09-25 17:59:57 +00:00
Fredrik Lundh
cfcea49218
unicode database compression, step 2:
...
- fixed attributions
- moved decomposition data to a separate table, in preparation
for step 3 (which won't happen before 2.0 final, promise!)
- use relative paths in the generator script
I have a lot more stuff in the works for 2.1, but let's leave
that for another day...
2000-09-25 08:07:06 +00:00
Tim Peters
2101348830
Fiddled w/ /F's cool new splitbins function: documented it, generalized it
...
a bit, sped it a lot primarily by removing the unused assumption that None was
a legit bin entry (the function doesn't really need to assume that there's
anything special about 0), added an optional "trace" argument, and in __debug__
mode added exhaustive verification that the decomposition is both correct and
doesn't overstep any array bounds (which wasn't obvious to me from staring at the
generated C code -- now I feel safe!). Did not commit a new unicodedata_db.h, as
the one produced by this version is identical to the one already checked in.
2000-09-25 07:13:41 +00:00
Fredrik Lundh
f367cacb98
unicode database compression, step 1:
...
- use unidb compression for the unicodedata module. on Windows,
the new unidatabase module is 120k, down from nearly 600k.
2000-09-24 23:18:31 +00:00