Commit Graph

41 Commits

Author SHA1 Message Date
Inada Naoki 6fec905de5
bpo-36642: make unicodedata const (GH-12855) 2019-04-17 08:40:34 +09:00
Benjamin Peterson 738c19f4c5
closes bpo-33376: Update to Unicode 12.0.0. (GH-12256) 2019-03-09 16:25:55 -08:00
Benjamin Peterson 7c69c1c0fb
update to Unicode 11.0.0 (closes bpo-33778) (GH-7439)
Also, standardize indentation of generated tables.
2018-06-06 20:14:28 -07:00
Benjamin Peterson 279a96206f bpo-30736: upgrade to Unicode 10.0 (#2344)
Straightforward. While we're at it, though, strip trailing whitespace from generated tables.
2017-06-22 22:31:08 -07:00
Benjamin Peterson 6775231597 Unicode 9.0.0
Not completely mechanical since support for East Asian Width changes—emoji
codepoints became Wide—had to be added to unicodedata.
2016-09-14 23:53:47 -07:00
Benjamin Peterson 4801383c29 upgrade to Unicode 8.0.0 2015-06-27 15:45:56 -05:00
Benjamin Peterson 3032ed7cb1 upgrade to unicode 7.0.0 2014-07-06 13:04:20 -07:00
Benjamin Peterson 94d08d908b upgrade unicode db to 6.3.0 (closes #19221) 2013-10-10 17:24:45 -04:00
Antoine Pitrou 9ed5f27266 Issue #18722: Remove uses of the "register" keyword in C code. 2013-08-13 20:18:52 +02:00
Benjamin Peterson b8350f1c7d upgrade to UCD 6.2 2012-09-29 13:47:39 -04:00
Benjamin Peterson 71f660e00f update to Unicode 6.1 2012-02-20 22:24:29 -05:00
Benjamin Peterson ad9c569825 delta encoding of upper/lower/title makes a glorious return (#12736) 2012-01-15 21:19:20 -05:00
Benjamin Peterson d5890c8db5 add str.casefold() (closes #13752) 2012-01-14 13:23:30 -05:00
Benjamin Peterson b2bf01d824 use full unicode mappings for upper/lower/title case (#12736)
Also broaden the category of characters that count as lowercase/uppercase.
2012-01-11 18:17:06 -05:00
Martin v. Löwis baecd7243a Upgrade to Unicode 6.0.0.
makeunicodedata.py: download all data files from unicode.org,
  switch to extracting Unihan data from zip file.
  Read linebreakprops and derivednormalizationprops even for
  old versions, even though they are not used in delta records.
test:unicode.py: U+11000 is now assigned, use U+14000 instead.
2010-10-11 22:42:28 +00:00
Amaury Forgeot d'Arc feb7307db4 #9210: remove --with-wctype-functions configure option.
The internal unicode database is now always used.

(after 5 years: see
  http://mail.python.org/pipermail/python-dev/2004-December/050193.html
)
2010-09-12 22:42:57 +00:00
Amaury Forgeot d'Arc 324ac65ceb #5127: Even on narrow unicode builds, the C functions that access the Unicode
Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept
and return characters from the full Unicode range (Py_UCS4).

The differences from Python code are few:
- unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit()
  now return the correct value for large code points
- repr() may consider more characters as printable.
2010-08-18 20:44:58 +00:00
Florent Xicluna 806d8cf0e8 Merged revisions 79494,79496 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r79494 | florent.xicluna | 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) | 2 lines

  #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14.
........
  r79496 | florent.xicluna | 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) | 2 lines

  Highlight the change of behavior related to r79494.  Now VT and FF are linebreaks.
........
2010-03-30 19:34:18 +00:00
Florent Xicluna faa663f03d Fixed a failure in test_bigmem.
Merged revision 79059 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r79059 | florent.xicluna | 2010-03-18 22:50:06 +0100 (jeu, 18 mar 2010) | 2 lines

  Issue #8024: Update the Unicode database to 5.2
........
2010-03-19 13:37:08 +00:00
Florent Xicluna f1789dee30 Revert Unicode UCD 5.2 upgrade in 3.x. It broke repr() for unicode objects, and gave failures in test_bigmem. Revert 79062, 79065 and 79083. 2010-03-19 01:17:46 +00:00
Florent Xicluna 657de43f97 Merged revisions 79059 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r79059 | florent.xicluna | 2010-03-18 22:50:06 +0100 (jeu, 18 mar 2010) | 2 lines

  Issue #8024: Update the Unicode database to 5.2
........
2010-03-18 22:11:01 +00:00
Amaury Forgeot d'Arc 919765a095 Merged revisions 75396 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r75396 | amaury.forgeotdarc | 2009-10-13 23:29:34 +0200 (mar., 13 oct. 2009) | 3 lines

  #7112: Fix compilation warning in unicodetype_db.h
  makeunicodedata now generates double literals
........
2009-10-13 23:18:53 +00:00
Amaury Forgeot d'Arc 7d52079395 Merged revisions 75272-75273 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r75272 | amaury.forgeotdarc | 2009-10-06 21:56:32 +0200 (mar., 06 oct. 2009) | 5 lines

  #1571184: makeunicodedata.py now generates the functions _PyUnicode_ToNumeric,
  _PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace.

  It now also parses the Unihan.txt for numeric values.
........
  r75273 | amaury.forgeotdarc | 2009-10-06 22:02:09 +0200 (mar., 06 oct. 2009) | 2 lines

  Add Anders Chrigstrom to Misc/ACKS for his work on unicodedata.
........
2009-10-06 21:03:20 +00:00
Walter Dörwald 1b08b30743 Merged revisions 71894 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r71894 | walter.doerwald | 2009-04-25 16:03:16 +0200 (Sa, 25 Apr 2009) | 4 lines

  Issue #5828 (Invalid behavior of unicode.lower): Fixed bogus logic in
  makeunicodedata.py and regenerated the Unicode database (This fixes
  u'\u1d79'.lower() == '\x00').
........
2009-04-25 14:13:56 +00:00
Benjamin Peterson 09832740d1 fix isprintable() on space characters #5126 2009-03-26 17:15:46 +00:00
Martin v. Löwis 93cbca33f2 Merged revisions 66362 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r66362 | martin.v.loewis | 2008-09-10 15:38:12 +0200 (Mi, 10 Sep 2008) | 3 lines

  Issue #3811: The Unicode database was updated to 5.1.
  Reviewed by Fredrik Lundh and Marc-Andre Lemburg.
........
2008-09-10 14:08:48 +00:00
Georg Brandl d52429fb49 Issue #3282: str.isprintable() should return False for undefined Unicode characters. 2008-07-04 15:55:02 +00:00
Georg Brandl 559e5d7f4d #2630: Implement PEP 3138.
The repr() of a string now contains printable Unicode characters unescaped.
The new ascii() builtin can be used to get a repr() with only ASCII characters in it.

PEP and patch were written by Atsuo Ishimoto.
2008-06-11 18:37:52 +00:00
Georg Brandl a26f8ca668 Revert r63934 -- it was mixing two patches. 2008-06-04 13:01:30 +00:00
Georg Brandl f954c4b9fb Remove meaning of -ttt, but still accept -t option on cmdline for compatibility. 2008-06-04 11:41:32 +00:00
Martin v. Löwis 13c3e380d1 Add XID_Start and XID_Continue properties to unicodectype. 2007-08-14 22:37:03 +00:00
Martin v. Löwis 480f1bb67b Update Unicode database to Unicode 4.1. 2006-03-09 23:38:20 +00:00
Hye-Shik Chang e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Hye-Shik Chang 974ed7cfa5 - SF #962502: Add two more methods for unicode type; width() and
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Martin v. Löwis b5c980b802 Add unidata_version. Bump generator version number. 2002-11-25 09:13:37 +00:00
Martin v. Löwis d5169bad94 Regenerate from Unicode 3.2.0 to include all First/Last ranges. 2002-11-24 23:10:08 +00:00
Martin v. Löwis 9def6a3a77 Update to Unicode 3.2 database. 2002-10-18 16:11:54 +00:00
Fredrik Lundh 9e9bcda547 forgot to check in the new makeunicodedata.py script 2001-01-21 17:01:31 +00:00
Fredrik Lundh fad27aee11 Added 38,642 missing characters to the Unicode database (first-last
ranges) -- but thanks to the 2.0 compression scheme, this doesn't add
a single byte to the resulting binaries (!)

Closes bug #117524
2000-11-03 20:24:15 +00:00
Fredrik Lundh 375732cd41 - don't set the titlecase flag for uppercase letters (sorry, tim) 2000-09-25 23:03:34 +00:00
Fredrik Lundh 69b58e2772 unicode database compression, step 3:
- use unidb compression for the unicodectype module.  smaller, faster,
  and slightly more portable...

(note: this commit doesn't include the unicodectype.c file itself; I'm
still waiting for the reviewers...)
2000-09-25 21:12:34 +00:00