cpython

Commit Graph

Author	SHA1	Message	Date
Florent Xicluna	22b243809e	#7643 : Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14 .	2010-03-30 08:24:06 +00:00
Florent Xicluna	2e0a53fdf6	Issue #8024 : Update the Unicode database to 5.2	2010-03-18 21:50:06 +00:00
Florent Xicluna	dc36472472	Remove py3k deprecation warnings from these Unicode tools.	2010-03-15 14:00:58 +00:00
Benjamin Peterson	f4803aa623	set svn:eol-style on various files	2010-03-08 22:15:11 +00:00
Amaury Forgeot d'Arc	5c92d4301d	#7112 : Fix compilation warning in unicodetype_db.h makeunicodedata now generates double literals	2009-10-13 21:29:34 +00:00
Amaury Forgeot d'Arc	d0052d17b1	#1571184 : makeunicodedata.py now generates the functions _PyUnicode_ToNumeric, _PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace. It now also parses the Unihan.txt for numeric values.	2009-10-06 19:56:32 +00:00
Amaury Forgeot d'Arc	70dda76cde	#1616979 : Add the cp720 (Arabic DOS) encoding. Since there is no official mapping file from unicode.org, the codec file is generated on Windows with the new genwincodec.py script.	2009-07-13 20:01:11 +00:00
Antoine Pitrou	e988e286b2	Issue #1734234 : Massively speedup `unicodedata.normalize()` when the string is already in normalized form, by performing a quick check beforehand. Original patch by Rauli Ruohonen.	2009-04-27 21:53:26 +00:00
Walter Dörwald	5d98ec76bb	Issue #5828 (Invalid behavior of unicode.lower): Fixed bogus logic in makeunicodedata.py and regenerated the Unicode database (This fixes u'\u1d79'.lower() == '\x00').	2009-04-25 14:03:16 +00:00
Martin v. Löwis	24329ba176	Issue #3811 : The Unicode database was updated to 5.1. Reviewed by Fredrik Lundh and Marc-Andre Lemburg.	2008-09-10 13:38:12 +00:00
Martin v. Löwis	111c180674	Make more symbols static.	2008-06-13 07:47:47 +00:00
Christian Heimes	c5f05e45cf	Patch #2167 from calvin: Remove unused imports	2008-02-23 17:40:11 +00:00
Martin v. Löwis	3f767795f6	Patch #1359618 : Speed-up charmap encoder.	2006-06-04 19:36:28 +00:00
Jack Diederich	df676c5ffd	when generating python code prefer to generate valid python code	2006-05-26 11:37:20 +00:00
Walter Dörwald	5d23f9a8a3	Don't add multiple empty lines at the end of the codec. With this a regenerated codec should survive reindent.py unchanged.	2006-03-31 10:13:10 +00:00
Walter Dörwald	cff22083f1	Whitespace for generated code.	2006-03-27 15:11:56 +00:00
Hye-Shik Chang	e2ac4abd01	Patch #1443155 : Add the incremental codecs support for CJK codecs. (reviewed by Walter Dörwald)	2006-03-26 02:34:59 +00:00
Walter Dörwald	abb02e5994	Patch #1436130 : codecs.lookup() now returns a CodecInfo object (a subclass of tuple) that provides incremental decoders and encoders (a way to use stateful codecs without the stream API). Functions codecs.getincrementaldecoder() and codecs.getincrementalencoder() have been added.	2006-03-15 11:35:15 +00:00
Martin v. Löwis	43179c8e6f	Add changelog entry.	2006-03-11 12:43:44 +00:00
Tim Peters	88ca467ca4	Whitespace normalization.	2006-03-10 23:39:56 +00:00
Martin v. Löwis	480f1bb67b	Update Unicode database to Unicode 4.1.	2006-03-09 23:38:20 +00:00
Tim Peters	536cf99536	Whitespace normalization.	2005-12-25 23:18:31 +00:00
Marc-André Lemburg	68b49ef8a1	Add Makefile which allows easily rebuilding the charmap codecs.	2005-10-25 11:55:01 +00:00
Marc-André Lemburg	89bbfd4a36	Add custom mapping files used for generating some of the charmap codecs.	2005-10-25 11:54:04 +00:00
Marc-André Lemburg	bd20ea55bc	Apply some cosmetic fixes to the output of the script. Only include the decoding map if no table can be generated.	2005-10-25 11:53:33 +00:00
Marc-André Lemburg	92b201debc	Add two new tools to compare codecs and show differences and to list all installed codecs.	2005-10-21 13:47:03 +00:00
Marc-André Lemburg	c5694c8bf4	Moved gencodec.py to the Tools/unicode/ directory. Added new support for decoding tables. Cleaned up the implementation a bit.	2005-10-21 13:45:17 +00:00
Hye-Shik Chang	e9ddfbb412	SF #989185 : Drop unicode.iswide() and unicode.width() and add unicodedata.east_asian_width(). You can still implement your own simple width() function using it like this: def width(u): w = 0 for c in unicodedata.normalize('NFC', u): cwidth = unicodedata.east_asian_width(c) if cwidth in ('W', 'F'): w += 2 else: w += 1 return w	2004-08-04 07:38:35 +00:00
Tim Peters	182b5aca27	Whitespace normalization, via reindent.py.	2004-07-18 06:16:08 +00:00
Hye-Shik Chang	974ed7cfa5	- SF #962502 : Add two more methods for unicode type; width() and iswide() for east asian width manipulation. (Inspired by David Goodger, Reviewed by Martin v. Loewis) - Move _PyUnicode_TypeRecord.flags to the end of the struct so that no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)	2004-06-02 16:49:17 +00:00
Armin Rigo	ba91b9fdda	Applying SF patch #949329 on behalf of Raymond Hettinger.	2004-05-19 19:10:18 +00:00
Martin v. Löwis	2548c730c1	Implement IDNA (Internationalized Domain Names in Applications).	2003-04-18 10:39:54 +00:00
Martin v. Löwis	b5c980b802	Add unidata_version. Bump generator version number.	2002-11-25 09:13:37 +00:00
Martin v. Löwis	97225da29a	Sort names independent of the Python version. Fix hex constant warning. Include all First/Last blocks.	2002-11-24 23:05:09 +00:00
Martin v. Löwis	677bde2dd1	Patch #626485 : Support Unicode normalization.	2002-11-23 22:08:15 +00:00
Martin v. Löwis	99ac3283e7	Verify that lower-higher case delta are 16-bit.	2002-10-18 17:34:18 +00:00
Martin v. Löwis	9def6a3a77	Update to Unicode 3.2 database.	2002-10-18 16:11:54 +00:00
Walter Dörwald	aaab30e00c	Apply diff2.txt from SF patch http://www.python.org/sf/572113 (with one small bugfix in bgen/bgen/scantools.py) This replaces string module functions with string methods for the stuff in the Tools directory. Several uses of string.letters etc. are still remaining.	2002-09-11 20:36:02 +00:00
Fredrik Lundh	b2dfd73bdc	Unicode nits: Don't include unicodedatabase.h no more. And make sure to build all tables in makeunicodedata.py.	2001-01-21 23:31:52 +00:00
Fredrik Lundh	7b7dd107b3	compress unicode decomposition tables (this saves another 55k)	2001-01-21 22:41:08 +00:00
Fredrik Lundh	9e9bcda547	forgot to check in the new makeunicodedata.py script	2001-01-21 17:01:31 +00:00
Fredrik Lundh	fad27aee11	Added 38,642 missing characters to the Unicode database (first-last ranges) -- but thanks to the 2.0 compression scheme, this doesn't add a single byte to the resulting binaries (!) Closes bug #117524	2000-11-03 20:24:15 +00:00
Fred Drake	9c6850510c	Remove bogus stdout redirection and use of sys.__stdout__; use augmented print statement instead.	2000-10-26 03:56:46 +00:00
Fredrik Lundh	375732cd41	- don't set the titlecase flag for uppercase letters (sorry, tim)	2000-09-25 23:03:34 +00:00
Fredrik Lundh	0f8fad4969	unicode database compression, step 3: - added decimal digit and digit properties to the unidb tables	2000-09-25 21:01:56 +00:00
Fredrik Lundh	e9133f7e2e	unicode database compression, step 3: - use unidb compression for the unicodectype module. smaller, faster, and slightly more portable... - also mention the unicode directory in Tools/README	2000-09-25 17:59:57 +00:00
Fredrik Lundh	cfcea49218	unicode database compression, step 2: - fixed attributions - moved decomposition data to a separate table, in preparation for step 3 (which won't happen before 2.0 final, promise!) - use relative paths in the generator script I have a lot more stuff in the works for 2.1, but let's leave that for another day...	2000-09-25 08:07:06 +00:00
Tim Peters	2101348830	Fiddled w/ /F's cool new splitbins function: documented it, generalized it a bit, sped it a lot primarily by removing the unused assumption that None was a legit bin entry (the function doesn't really need to assume that there's anything special about 0), added an optional "trace" argument, and in __debug__ mode added exhaustive verification that the decomposition is both correct and doesn't overstep any array bounds (which wasn't obvious to me from staring at the generated C code -- now I feel safe!). Did not commit a new unicodedata_db.h, as the one produced by this version is identical to the one already checked in.	2000-09-25 07:13:41 +00:00
Fredrik Lundh	f367cacb98	unicode database compression, step 1: - use unidb compression for the unicodedata module. on Windows, the new unidatabase module is 120k, down from nearly 600k.	2000-09-24 23:18:31 +00:00

49 Commits