Commit Graph

21 Commits

Author SHA1 Message Date
Benjamin Peterson 71f660e00f update to Unicode 6.1 2012-02-20 22:24:29 -05:00
Martin v. Löwis baecd7243a Upgrade to Unicode 6.0.0.
makeunicodedata.py: download all data files from unicode.org,
  switch to extracting Unihan data from zip file.
  Read linebreakprops and derivednormalizationprops even for
  old versions, even though they are not used in delta records.
test:unicode.py: U+11000 is now assigned, use U+14000 instead.
2010-10-11 22:42:28 +00:00
Florent Xicluna faa663f03d Fixed a failure in test_bigmem.
Merged revision 79059 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r79059 | florent.xicluna | 2010-03-18 22:50:06 +0100 (jeu, 18 mar 2010) | 2 lines

  Issue #8024: Update the Unicode database to 5.2
........
2010-03-19 13:37:08 +00:00
Florent Xicluna f1789dee30 Revert Unicode UCD 5.2 upgrade in 3.x. It broke repr() for unicode objects, and gave failures in test_bigmem. Revert 79062, 79065 and 79083. 2010-03-19 01:17:46 +00:00
Florent Xicluna 657de43f97 Merged revisions 79059 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r79059 | florent.xicluna | 2010-03-18 22:50:06 +0100 (jeu, 18 mar 2010) | 2 lines

  Issue #8024: Update the Unicode database to 5.2
........
2010-03-18 22:11:01 +00:00
Amaury Forgeot d'Arc 7d52079395 Merged revisions 75272-75273 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r75272 | amaury.forgeotdarc | 2009-10-06 21:56:32 +0200 (mar., 06 oct. 2009) | 5 lines

  #1571184: makeunicodedata.py now generates the functions _PyUnicode_ToNumeric,
  _PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace.

  It now also parses the Unihan.txt for numeric values.
........
  r75273 | amaury.forgeotdarc | 2009-10-06 22:02:09 +0200 (mar., 06 oct. 2009) | 2 lines

  Add Anders Chrigstrom to Misc/ACKS for his work on unicodedata.
........
2009-10-06 21:03:20 +00:00
Antoine Pitrou 7a0fedfd1d Merged revisions 72054 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r72054 | antoine.pitrou | 2009-04-27 23:53:26 +0200 (lun., 27 avril 2009) | 5 lines

  Issue #1734234: Massively speedup `unicodedata.normalize()` when the
  string is already in normalized form, by performing a quick check beforehand.
  Original patch by Rauli Ruohonen.
........
2009-04-27 22:31:40 +00:00
Martin v. Löwis 93cbca33f2 Merged revisions 66362 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r66362 | martin.v.loewis | 2008-09-10 15:38:12 +0200 (Mi, 10 Sep 2008) | 3 lines

  Issue #3811: The Unicode database was updated to 5.1.
  Reviewed by Fredrik Lundh and Marc-Andre Lemburg.
........
2008-09-10 14:08:48 +00:00
Martin v. Löwis 59683e8529 Merged revisions 64226 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r64226 | martin.v.loewis | 2008-06-13 09:47:47 +0200 (Fr, 13 Jun 2008) | 2 lines

  Make more symbols static.
........
2008-06-13 07:50:45 +00:00
Martin v. Löwis 480f1bb67b Update Unicode database to Unicode 4.1. 2006-03-09 23:38:20 +00:00
Hye-Shik Chang e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Hye-Shik Chang 974ed7cfa5 - SF #962502: Add two more methods for unicode type; width() and
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Martin v. Löwis b5c980b802 Add unidata_version. Bump generator version number. 2002-11-25 09:13:37 +00:00
Martin v. Löwis d5169bad94 Regenerate from Unicode 3.2.0 to include all First/Last ranges. 2002-11-24 23:10:08 +00:00
Martin v. Löwis 677bde2dd1 Patch #626485: Support Unicode normalization. 2002-11-23 22:08:15 +00:00
Martin v. Löwis 9def6a3a77 Update to Unicode 3.2 database. 2002-10-18 16:11:54 +00:00
Fredrik Lundh 7b7dd107b3 compress unicode decomposition tables (this saves another 55k) 2001-01-21 22:41:08 +00:00
Fredrik Lundh 9e9bcda547 forgot to check in the new makeunicodedata.py script 2001-01-21 17:01:31 +00:00
Fredrik Lundh fad27aee11 Added 38,642 missing characters to the Unicode database (first-last
ranges) -- but thanks to the 2.0 compression scheme, this doesn't add
a single byte to the resulting binaries (!)

Closes bug #117524
2000-11-03 20:24:15 +00:00
Fredrik Lundh cfcea49218 unicode database compression, step 2:
- fixed attributions
- moved decomposition data to a separate table, in preparation
  for step 3 (which won't happen before 2.0 final, promise!)
- use relative paths in the generator script

I have a lot more stuff in the works for 2.1, but let's leave
that for another day...
2000-09-25 08:07:06 +00:00
Fredrik Lundh eedb5764a5 unicode database compression, step 1:
- use unidb compression for the unicodedata module.  on Windows,
  the new unidatabase module is 120k, down from nearly 600k.
2000-09-24 21:28:28 +00:00