cpython

Commit Graph

Author	SHA1	Message	Date
Marc-André Lemburg	2cb94aba12	Enhance the performance of two important Unicode character type lookups: whitespace and linebreak. These lookup tables are from the Python 1.6 version with the addition of the 205F code point which was added as whitespace code point to Unicode since then.	2005-10-20 19:06:35 +00:00
Hye-Shik Chang	e9ddfbb412	SF #989185 : Drop unicode.iswide() and unicode.width() and add unicodedata.east_asian_width(). You can still implement your own simple width() function using it like this: def width(u): w = 0 for c in unicodedata.normalize('NFC', u): cwidth = unicodedata.east_asian_width(c) if cwidth in ('W', 'F'): w += 2 else: w += 1 return w	2004-08-04 07:38:35 +00:00
Hye-Shik Chang	974ed7cfa5	- SF #962502 : Add two more methods for unicode type; width() and iswide() for east asian width manipulation. (Inspired by David Goodger, Reviewed by Martin v. Loewis) - Move _PyUnicode_TypeRecord.flags to the end of the struct so that no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)	2004-06-02 16:49:17 +00:00
Hye-Shik Chang	7db07e6972	Fix gcc 3.3 warnings related to Py_UNICODE_WIDE.	2003-12-29 01:36:01 +00:00
Martin v. Löwis	edf368c351	Make lower/upper/title work for non-BMP characters.	2002-10-18 16:40:36 +00:00
Martin v. Löwis	9def6a3a77	Update to Unicode 3.2 database.	2002-10-18 16:11:54 +00:00
Fredrik Lundh	72b068566a	removed "register const" from scalar arguments to the unicode predicates	2001-06-27 22:08:26 +00:00
Fredrik Lundh	8f4558583f	use Py_UNICODE_WIDE instead of USE_UCS4_STORAGE and Py_UNICODE_SIZE tests.	2001-06-27 18:59:43 +00:00
Martin v. Löwis	ce9b5a55e1	Encode surrogates in UTF-8 even for a wide Py_UNICODE. Implement sys.maxunicode. Explicitly wrap around upper/lower computations for wide Py_UNICODE. When decoding large characters with UTF-8, represent expected test results using the \U notation.	2001-06-27 06:28:56 +00:00
Fredrik Lundh	ee13dba1aa	more unicode tweaks: fix unicodectype for sizeof(Py_UNICODE) > sizeof(int)	2001-06-26 20:36:12 +00:00
Fredrik Lundh	9e7dd4c185	unicode database compression, step 3: - use unidb compression for the unicodectype module. smaller, faster, and slightly more portable...	2000-09-25 21:48:13 +00:00
Trent Mick	8a74e5fc2c	Add the current Win64 compiler to the list of those that need the huge switch statement broken up. This will probably not be necessary when the Win64 compiler matures.	2000-08-12 19:37:27 +00:00
Guido van Rossum	16b1ad9c7d	Changing the CNRI copyright notice according to CNRI's instructions. This is a notice without a date, which apparently is not a claim to copyright but only advice to the reader. IANAL. :-)	2000-08-03 16:24:25 +00:00
Jack Jansen	56cdce3070	Conditionally (currently on ifdef macintosh) break the large switch up into 1000-case smaller ones.	2000-07-06 13:57:38 +00:00
Marc-André Lemburg	f3938f55c7	Added new lookup API which matches all alphabetic Unicode characters, i.e the ones with category 'Ll','Lu','Lt','Lo','Lm'.	2000-07-05 09:48:59 +00:00
Guido van Rossum	dc742b3184	Marc-Andre Lemburg: Added a few missing whitespace Unicode char mappings. Thanks to Brian Hooper.	2000-04-11 15:39:02 +00:00
Guido van Rossum	603484d759	Unicode character type helpers, written by Marc-Andre Lemburg.	2000-03-10 22:52:46 +00:00

17 Commits