Commit Graph

69 Commits

Author SHA1 Message Date
Benjamin Peterson b027c6cae0 fix possible overflow bugs in unicodedata (closes #23367) 2015-03-02 11:17:05 -05:00
Ezio Melotti 6d0f0f299b #18803: fix more typos. Patch by Févry Thibault. 2013-08-26 01:31:30 +03:00
Ezio Melotti 419e23cbb0 #18466: fix more typos. Patch by Févry Thibault. 2013-08-17 16:56:09 +03:00
Ezio Melotti 67c563e2f1 #16681: use "bidirectional class" instead of "bidirectional category" in the docstring too. 2012-12-14 20:12:25 +02:00
Antoine Pitrou 44b3b5457a Remove all other uses of the C tolower()/toupper() which could break with a Turkish locale.
(except in the strop module, which is deprecated anyway)
2011-10-04 13:55:37 +02:00
Alexander Belopolsky dce6cf353c Merged revisions 87442 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r87442 | alexander.belopolsky | 2010-12-22 21:27:37 -0500 (Wed, 22 Dec 2010) | 1 line

  Issue #10254: Fixed a crash and a regression introduced by the implementation of PRI 29.
........
2010-12-28 15:47:56 +00:00
Martin v. Löwis e03c7787b9 Issue #10459: Update CJK character names to Unicode 5.2. 2010-11-22 10:53:46 +00:00
Antoine Pitrou c83ea137d7 Untabify C files. Will watch buildbots. 2010-05-09 14:46:46 +00:00
Larry Hastings 402b73fb8d Backported PyCapsule from 3.1, and converted most uses of
CObject to PyCapsule.
2010-03-25 00:54:54 +00:00
Ezio Melotti 0d0b80bc3e Link specifically to the UCD version 5.2.0. 2010-03-23 00:38:12 +00:00
Ezio Melotti ae735a763e Update the version number of the Unicode Database in a few more places. 2010-03-22 23:07:32 +00:00
Victor Stinner 7c924ec925 Issue #1054943: Fix unicodedata.normalize('NFC', text) for the Public Review
Issue #29.

PR #29 was released in february 2004!
2010-03-04 12:09:33 +00:00
Amaury Forgeot d'Arc d0052d17b1 #1571184: makeunicodedata.py now generates the functions _PyUnicode_ToNumeric,
_PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace.

It now also parses the Unihan.txt for numeric values.
2009-10-06 19:56:32 +00:00
Antoine Pitrou e988e286b2 Issue #1734234: Massively speedup `unicodedata.normalize()` when the
string is already in normalized form, by performing a quick check beforehand.
Original patch by Rauli Ruohonen.
2009-04-27 21:53:26 +00:00
Martin v. Löwis 24329ba176 Issue #3811: The Unicode database was updated to 5.1.
Reviewed by Fredrik Lundh and Marc-Andre Lemburg.
2008-09-10 13:38:12 +00:00
Gregory P. Smith dd96db63f6 This reverts r63675 based on the discussion in this thread:
http://mail.python.org/pipermail/python-dev/2008-June/079988.html

Python 2.6 should stick with PyString_* in its codebase.  The PyBytes_* names
in the spirit of 3.0 are available via a #define only.  See the email thread.
2008-06-09 04:58:54 +00:00
Walter Dörwald a2a89a8712 Change all functions that expect one unicode character to accept a pair of
surrogates in narrow builds. Fixes issue #1706460.
2008-06-02 20:36:03 +00:00
Christian Heimes 593daf545b Renamed PyString to PyBytes 2008-05-26 12:51:38 +00:00
Christian Heimes e93237dfcc #1629: Renamed Py_Size, Py_Type and Py_Refcnt to Py_SIZE, Py_TYPE and Py_REFCNT. Macros for b/w compatibility are available. 2007-12-19 02:37:44 +00:00
Martin v. Löwis f1e0b3f630 Bug #1704793: Return UTF-16 pair if unicodedata.lookup cannot
represent the result in a single character.
2007-07-28 07:03:05 +00:00
Martin v. Löwis 6819210b9e PEP 3123: Provide forward compatibility with Python 3.0, while keeping
backwards compatibility. Add Py_Refcnt, Py_Type, Py_Size, and
PyVarObject_HEAD_INIT.
2007-07-21 06:55:02 +00:00
Walter Dörwald 6fc2382883 Replace C++ comment with C comment (fixes SF bug #1593525). 2006-11-09 16:23:26 +00:00
Neal Norwitz b45f351832 I'm not sure why this code allocates this string for the error message.
I think it would be better to always use snprintf and have the format
limit the size of the name appropriately (like %.200s).

Klocwork #340
2006-08-12 01:57:47 +00:00
Martin v. Löwis 789c09d2cd Update dangling references to the 3.2 database to
mention that this is UCD 4.1 now.
2006-08-10 19:04:00 +00:00
Neal Norwitz 37f694f21b No functional change. Add comment and assert to describe why there cannot be overflow which was reported by Klocwork. Discussed on python-dev 2006-07-27 04:04:50 +00:00
Martin v. Löwis d004fc810a Patch 1494554: Update numeric properties to Unicode 4.1. 2006-05-27 08:36:52 +00:00
Neal Norwitz 88c97845c6 No reason to export get_decomp_record, make static 2006-04-17 00:36:29 +00:00
Martin v. Löwis 3c6e4188ed Support NFD of very long strings. 2006-04-13 06:36:31 +00:00
Neal Norwitz 65c05b20e9 Get rid of warnings about using chars as subscripts
on Alpha (and possibly other platforms) by using Py_CHARMASK().
2006-04-10 02:17:47 +00:00
Martin v. Löwis c350912990 Adjust CJK Ideograph range to Unicode 4.1. 2006-03-11 12:16:23 +00:00
Martin v. Löwis 0e2f9b2dfb Fix refcounting bug. 2006-03-10 11:29:32 +00:00
Martin v. Löwis 5bd7c02298 Avoid forward-declaring the methods array.
Rename unicodedata.db* to unicodedata.ucd*
2006-03-10 11:20:04 +00:00
Martin v. Löwis 480f1bb67b Update Unicode database to Unicode 4.1. 2006-03-09 23:38:20 +00:00
Thomas Wouters 1e365b265a Remove gcc (4.0.x) warning about uninitialized value by explicitly setting
the sentinel value in the main function, rather than the helper. This
function could possibly do with an early-out if any of the helper calls ends
up with a len of 0, but I doubt it really matters (how common are malformed
hangul syllables, really?)
2006-03-01 21:58:30 +00:00
Martin v. Löwis 8b291e2d66 Patch #1213831: Fix typo in unicodedata._getcode.
Will backport to Python 2.4.
2005-09-18 08:17:56 +00:00
Hye-Shik Chang 4c560ea05b Correct URL to the official UnicodeData 3.2.0 resource. (Reported
by Darek Suchojad)
2005-06-04 07:31:48 +00:00
Hye-Shik Chang cf18a5d67b Fill docstrings for module and functions, extracted from the tex
documentation.  (Patch #1173245, Contributed by Jeremy Yallop)
2005-04-04 16:32:07 +00:00
Hye-Shik Chang e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Hye-Shik Chang 69dc1c8f6a Fix typo. 2004-07-15 04:30:25 +00:00
Martin v. Löwis 61e40bd897 Special case normalization of empty strings. Fixes #924361.
Backported to 2.3.
2004-04-17 19:36:48 +00:00
Martin v. Löwis d2171d2ba4 Overallocate target buffer for normalization more early. Fixes #834676.
Backported to 2.3.
2003-11-06 20:47:57 +00:00
Neal Norwitz e9c571f968 Fix SF bug #694816, remove comparison of unsigned value < 0 2003-02-28 03:14:37 +00:00
Martin v. Löwis 2fb661fb80 Remove C++ comment. 2002-12-07 14:56:36 +00:00
Martin v. Löwis b5c980b802 Add unidata_version. Bump generator version number. 2002-11-25 09:13:37 +00:00
Martin v. Löwis 8d93ca1383 Verify that the code in CJK UNIFIED IDEOGRAPH- actually denotes an ideograph. 2002-11-23 22:10:29 +00:00
Martin v. Löwis 677bde2dd1 Patch #626485: Support Unicode normalization. 2002-11-23 22:08:15 +00:00
Martin v. Löwis ef7fe2e813 Implement names for CJK unified ideographs. Add name to KeyError output.
Verify that the lookup for an existing name succeeds.
2002-11-23 18:01:32 +00:00
Martin v. Löwis 2f4be4e38a Fix off-by-one error. 2002-11-23 17:11:06 +00:00
Martin v. Löwis 7d41e29c58 Patch #626548: Support Hangul syllable names. 2002-11-23 12:22:32 +00:00
Martin v. Löwis 9def6a3a77 Update to Unicode 3.2 database. 2002-10-18 16:11:54 +00:00