cpython

Commit Graph

Author	SHA1	Message	Date
Thomas Wouters	a977329b6f	Merge part of the trunk changes into the p3yk branch. This merges from 43030 (branch-creation time) up to 43067. 43068 and 43069 contain a little swapping action between re.py and sre.py, and this mightily confuses svn merge, so later changes are going in separately. This merge should break no additional tests. The last-merged revision is going in a 'last_merge' property on '.' (the branch directory.) Arbitrarily chosen, really; if there's a BCP for this, I couldn't find it, but we can easily change it afterwards ;)	2006-04-21 09:43:23 +00:00
Guido van Rossum	45aecf451a	Checkpoint. 218 tests are okay; 53 are failing. Done so far: - all classes are new-style (but ripping out classobject.[ch] isn't done) - int/int -> float - all exceptions must derive from BaseException - absolute import - 'as' and 'with' are keywords	2006-03-15 04:58:47 +00:00
Martin v. Löwis	5bd7c02298	Avoid forward-declaring the methods array. Rename unicodedata.db* to unicodedata.ucd*	2006-03-10 11:20:04 +00:00
Martin v. Löwis	480f1bb67b	Update Unicode database to Unicode 4.1.	2006-03-09 23:38:20 +00:00
Marc-André Lemburg	fe4b34cc4b	Fix the encodings package codec search function to only search inside its own package. Fixes problem reported in patch #1433198. Add codec search function for codec test codec.	2006-02-19 15:22:22 +00:00
Martin v. Löwis	412ed3b8a7	Patch #1177307 : UTF-8-Sig codec.	2006-01-08 10:45:39 +00:00
Tim Peters	536cf99536	Whitespace normalization.	2005-12-25 23:18:31 +00:00
Marc-André Lemburg	d9cf593b49	Cosmetic change: make all hex literals use upper case hex so that they look more like the Unicode Consortium files. Add ending new-line to all source files.	2005-10-24 12:14:59 +00:00
Marc-André Lemburg	3c72ded23d	Removed the decoding_map from the codecs where this is possible. Replaced the tis_620, cp1140 and koi8_u codecs with new ones based on custom mapping files.	2005-10-24 12:07:49 +00:00
Marc-André Lemburg	0f00ba8bd8	Replace the old EBCDIC codecs with new ones using the decoding table.	2005-10-21 14:35:35 +00:00
Marc-André Lemburg	7797be7b3b	Alias iso8859_1 to latin_1 which is the same encoding, but has a much faster codec implementation.	2005-10-21 14:02:28 +00:00
Marc-André Lemburg	75c9e8392e	Add a few more Mac OS encodings. The mapping tables for these are available at ftp.unicode.org.	2005-10-21 13:58:32 +00:00
Marc-André Lemburg	a1129f4b9b	Replace the old charmap codecs with new ones generated from the current mapping tables available at ftp.unicode.org. These new codecs include and use character decoding tables which speeds up decoding by a few factors.	2005-10-21 13:49:12 +00:00
Walter Dörwald	007f8dfde2	Bug #1245379 : Add "unicode-1-1-utf-7" as an alias for "utf-7" as specified by RFC 1642.	2005-10-09 19:42:27 +00:00
Neal Norwitz	4ce69a5b06	No need to import exceptions, they are builtins	2005-09-01 00:45:28 +00:00
Martin v. Löwis	8b59514e57	Make IDNA return an empty string when the input is empty. Fixes #1163178 . Will backport to 2.4.	2005-08-25 11:03:38 +00:00
Walter Dörwald	729c31f5c3	Reset internal buffers when seek() is called. This fixes SF bug #1156259 .	2005-03-14 19:06:30 +00:00
Walter Dörwald	e1a0391b49	Fix wrong variable name.	2004-12-29 13:11:10 +00:00
Marc-André Lemburg	9ab8818c87	Rearranged mappings to value sorting order.	2004-12-10 21:54:35 +00:00
Walter Dörwald	69652035bc	SF patch #998993 : The UTF-8 and the UTF-16 stateful decoders now support decoding incomplete input (when the input stream is temporarily exhausted). codecs.StreamReader now implements buffering, which enables proper readline support for the UTF-16 decoders. codecs.StreamReader.read() has a new argument chars which specifies the number of characters to return. codecs.StreamReader.readline() and codecs.StreamReader.readlines() have a new argument keepends. Trailing "\n"s will be stripped from the lines if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and PyUnicode_DecodeUTF16Stateful.	2004-09-07 20:24:22 +00:00
Tim Peters	d1b7827216	Whitespace normalization.	2004-08-07 06:03:09 +00:00
Marc-André Lemburg	c759f070ef	Added new codecs and aliases for ISO_8859-11, ISO_8859-16 and TIS-620. Closes SF bug #1001895: Adding missing ISO 8859 codecs, especially Thai.	2004-08-05 12:43:30 +00:00
Tim Peters	c0cbc8611b	Whitespace normalization.	2004-07-31 21:17:37 +00:00
Marc-André Lemburg	17b6d28c64	New codec: [ 996067 ] hp-roman8 codec	2004-07-28 15:37:54 +00:00
Marc-André Lemburg	cd8a4cb3d3	Added new codec hp-roman8 submitted as patch [ 996067 ] hp-roman8 codec.	2004-07-28 15:35:29 +00:00
Hye-Shik Chang	2bb146f2f4	Bring CJKCodecs 1.1 into trunk. This completely reorganizes source and installed layouts to make maintenance simple and easy. And it also adds four new codecs; big5hkscs, euc-jis-2004, shift-jis-2004 and iso2022-jp-2004.	2004-07-18 03:06:29 +00:00
Tim Peters	4e0e1b6a54	Whitespace normalization.	2004-07-07 20:54:48 +00:00
Martin v. Löwis	708b4dacf4	Convert input to a string object. Fixes #909230 . Backported 2.3.	2004-03-23 23:40:36 +00:00
Hye-Shik Chang	5c5316f111	Add a new unicode codec: ptcp154 (Kazakh)	2004-03-19 08:06:07 +00:00
Marc-André Lemburg	361d66de5d	Fix wrong character mapping in koi8_u: SF bug #902501 .	2004-02-23 09:00:43 +00:00
Marc-André Lemburg	c83dddf7fe	Let the default encodings search function lookup aliases before trying the codec import. This allows applications to install codecs which override (non-special-cased) builtin codecs.	2004-01-20 09:40:14 +00:00
Marc-André Lemburg	5c94d33077	Add some more code page aliases needed for completeness.	2004-01-20 09:38:52 +00:00
Hye-Shik Chang	b619e4b36c	Fix a typo: s/iso_3022/iso2022/	2004-01-20 09:33:30 +00:00
Hye-Shik Chang	3e2a306920	Add CJK codecs support as discussed on python-dev. (SF #873597 ) Several style fixes are suggested by Martin v. Loewis and Marc-Andre Lemburg. Thanks!	2004-01-17 14:29:29 +00:00
Raymond Hettinger	0ad142aba0	Revert previous change. MAL preferred the old version.	2003-12-01 13:26:46 +00:00
Raymond Hettinger	a45517065a	Simplifed the code.	2003-12-01 10:41:02 +00:00
Raymond Hettinger	9edae346dd	Fix typo in the comments.	2003-09-24 03:57:36 +00:00
Raymond Hettinger	9a80c5dbc4	Added codec for bz2 compression.	2003-09-23 20:21:01 +00:00
Martin v. Löwis	0d8e16c7ad	Support trailing dots in DNS names. Fixes #782510 . Will backport to 2.3.	2003-08-05 06:19:47 +00:00
Skip Montanaro	5d6ceb4aae	more generic reference to python interpreter	2003-07-22 14:37:42 +00:00
Marc-André Lemburg	2820125935	Remove usage of re module from encodings package search function.	2003-05-16 17:07:51 +00:00
Tim Peters	0eadaac7dc	Whitespace normalization.	2003-04-24 16:02:54 +00:00
Martin v. Löwis	2548c730c1	Implement IDNA (Internationalized Domain Names in Applications).	2003-04-18 10:39:54 +00:00
Martin v. Löwis	7fb697b5d2	Revert Patch #670715 : iconv support.	2003-04-03 04:49:12 +00:00
Neal Norwitz	6156a2d07c	Handle iconv initialization erorrs	2003-02-28 20:00:42 +00:00
Martin v. Löwis	9789aefa61	Patch #670715 : Universal Unicode Codec for POSIX iconv.	2003-01-26 11:30:36 +00:00
Tim Peters	6578dc925f	Whitespace normalization.	2002-12-24 18:31:27 +00:00
Neal Norwitz	d8407a7031	Add new encoding for Ukrainian Cyrillic	2002-10-17 22:15:33 +00:00
Guido van Rossum	c8c6065231	When looking for an alias, first look for the normalized name (which still may contain dots), then if that doesn't exist look for the name with dots replaced by underscores. This is a little more forgiving.	2002-10-04 20:49:05 +00:00
Marc-André Lemburg	8dc5ff2e5a	Undo the removal. Guido mentioned that the encoding name is in active by some email headers.	2002-10-04 16:30:42 +00:00
Marc-André Lemburg	68fc27385d	Remove unneeded alias.	2002-10-04 15:57:03 +00:00
Marc-André Lemburg	a40ea75625	Fix doc-string.	2002-10-04 11:58:24 +00:00
Marc-André Lemburg	9d158bb66f	Adapt lookup names to new more general encoding name normalization scheme.	2002-10-04 11:51:39 +00:00
Marc-André Lemburg	7012673d67	Extending the encoding name normalization to handle more non-alphanumeric characters.	2002-10-04 11:45:38 +00:00
Guido van Rossum	479f3d3d2a	Oops, must convert hyphens to underscores in keys of aliases dict.	2002-09-26 20:08:23 +00:00
Guido van Rossum	b7a88e533d	Add yet another alias for ASCII found in the field. Will backport to 2.2.2.	2002-09-25 16:44:34 +00:00
Tim Peters	280488b9a3	Whitespace normalization.	2002-08-23 18:19:30 +00:00
Martin v. Löwis	8a8da798a5	Patch #505705 : Remove eval in pickle and cPickle.	2002-08-14 07:46:28 +00:00
Tim Peters	469cdad822	Whitespace normalization.	2002-08-08 20:19:19 +00:00
Martin v. Löwis	b9e0764d8b	Revert #571603 since it is ok to import codecs that are not subdirectories of encodings. Skip modules that don't have a getregentry function.	2002-07-29 14:05:24 +00:00
Martin v. Löwis	fc4c24c142	Patch #571603 : Refer to encodings package explicitly.	2002-07-28 11:31:33 +00:00
Marc-André Lemburg	a83ffa89f2	Palm OS encoding from Sjoerd Mullender	2002-07-12 14:36:22 +00:00
Marc-André Lemburg	3ccb09cba3	Fix for bug #222395 : UTF-16 et al. don't handle .readline(). They now raise an NotImplementedError to hint to the truth ;-)	2002-04-05 12:12:00 +00:00
Marc-André Lemburg	a0af63b242	Corrected import behaviour for codecs which live outside the encodings package.	2002-02-11 17:43:46 +00:00
Marc-André Lemburg	462004e90a	Add IANA character set aliases to the encodings alias dictionary and make alias lookup lazy. Note that only those IANA character set aliases were added for which we actually have codecs in the encodings package.	2002-02-10 21:36:20 +00:00
Martin v. Löwis	79d802d58c	Patch #487275 : Add windows-1251 charset alias.	2001-12-02 12:24:19 +00:00
Marc-André Lemburg	35b0cb09d7	Python part of the UTF-7 codec by Brian Quinlan.	2001-09-20 12:56:14 +00:00
Marc-André Lemburg	c60e6f7771	Patch #435971 : UTF-7 codec by Brian Quinlan.	2001-09-20 10:35:46 +00:00
Marc-André Lemburg	26e3b681b2	Patch #462635 by Andrew Kuchling correcting bugs in the new codecs -- the self argument does matter for Python functions (it does not for C functions which most other codecs use).	2001-09-20 10:33:38 +00:00
Marc-André Lemburg	816a1b75b7	Fixed search function error reporting in the encodings package __init__.py module to raise errors which can be catched as LookupErrors as well as SystemErrors. Modified the error messages to include more information about the failing module.	2001-09-19 11:52:07 +00:00
Andrew M. Kuchling	fd6608bcea	Fix typo (PyChecker)	2001-08-13 13:48:55 +00:00
Martin v. Löwis	9b75dca192	Expose nl_langinfo through locale where available.	2001-08-10 13:58:50 +00:00
Marc-André Lemburg	92b550cdd8	This patch by Martin v. Loewis changes the UTF-16 codec to only write a BOM at the start of the stream and also to only read it as BOM at the start of a stream. Subsequent reading/writing of BOMs will read/write the BOM as ZWNBSP character. This is in sync with the Unicode specifications. Note that UTF-16 files will now have to start with a BOM mark in order to be readable by the codec.	2001-06-19 20:07:51 +00:00
Martin v. Löwis	13b8bc5478	Patch #429957 : Add support for cp1140, which is identical to cp037, with the addition of the euro character. Also added a few EDBDIC aliases.	2001-06-07 19:39:25 +00:00
Mark Hammond	194bfb2805	Add some useful Windows encodings - patch #423221 .	2001-06-04 02:31:23 +00:00
Marc-André Lemburg	716cf91839	Moved the encoding map building logic from the individual mapping codec files to codecs.py and added logic so that multi mappings in the decoding maps now result in mappings to None (undefined mapping) in the encoding maps.	2001-05-16 09:41:45 +00:00
Guido van Rossum	acfdf156aa	Add quoted-printable codec	2001-05-15 15:34:07 +00:00
Marc-André Lemburg	2d9204199f	This patch changes the way the string .encode() method works slightly and introduces a new method .decode(). The major change is that strg.encode() will no longer try to convert Unicode returns from the codec into a string, but instead pass along the Unicode object as-is. The same is now true for all other codec return types. The underlying C APIs were changed accordingly. Note that even though this does have the potential of breaking existing code, the chances are low since conversion from Unicode previously took place using the default encoding which is normally set to ASCII rendering this auto-conversion mechanism useless for most Unicode encodings. The good news is that you can now use .encode() and .decode() with much greater ease and that the door was opened for better accessibility of the builtin codecs. As demonstration of the new feature, the patch includes a few new codecs which allow string to string encoding and decoding (rot13, hex, zip, uu, base64). Written by Marc-Andre Lemburg. Copyright assigned to the PSF.	2001-05-15 12:00:02 +00:00
Marc-André Lemburg	a866df806d	This patch changes the default behaviour of the builtin charmap codec to not apply Latin-1 mappings for keys which are not found in the mapping dictionaries, but instead treat them as undefined mappings. The patch was originally written by Martin v. Loewis with some additional (cosmetic) changes and an updated test script by Marc-Andre Lemburg. The standard codecs were recreated from the most current files available at the Unicode.org site using the Tools/scripts/gencodec.py tool. This patch closes the bugs #116285 and #119960.	2001-01-03 21:29:14 +00:00
Marc-André Lemburg	988ad2bdff	Changed .getaliases() support to register the new aliases in the encodings package aliases mapping dictionary rather than in the internal cache used by the search function. This enables aliases to take advantage of the full normalization process applied to encoding names which was previously not available. The patch restricts alias registration to new aliases. Existing aliases cannot be overridden anymore.	2000-12-12 14:45:35 +00:00
Thomas Wouters	7e47402264	Spelling fixes supplied by Rob W. W. Hooft. All these are fixes in either comments, docstrings or error messages. I fixed two minor things in test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't"). There is a minor style issue involved: Guido seems to have preferred English grammar (behaviour, honour) in a couple places. This patch changes that to American, which is the more prominent style in the source. I prefer English myself, so if English is preferred, I'd be happy to supply a patch myself ;)	2000-07-16 12:04:32 +00:00
Marc-André Lemburg	7ebb92ea66	Marc-Andre Lemburg <mal@lemburg.com>: Removed import of string module -- use string methods directly. Thanks to Finn Bock.	2000-06-13 12:04:05 +00:00
Marc-André Lemburg	4fd73f0465	Marc-Andre Lemburg <mal@lemburg.com>: Added some more codec aliases. Some of them are needed by the new locale.py encoding support.	2000-06-07 09:12:30 +00:00
Marc-André Lemburg	54480d300a	New codec which always raises an exception when used. This codec can be used to effectively switch off string coercion to Unicode.	2000-06-07 09:04:05 +00:00
Guido van Rossum	9e896b37c7	Marc-Andre's third try at this bulk patch seems to work (except that his copy of test_contains.py seems to be broken -- the lines he deleted were already absent). Checkin messages: New Unicode support for int(), float(), complex() and long(). - new APIs PyInt_FromUnicode() and PyLong_FromUnicode() - added support for Unicode to PyFloat_FromString() - new encoding API PyUnicode_EncodeDecimal() which converts Unicode to a decimal char* string (used in the above new APIs) - shortcuts for calls like int(<int object>) and float(<float obj>) - tests for all of the above Unicode compares and contains checks: - comparing Unicode and non-string types now works; TypeErrors are masked, all other errors such as ValueError during Unicode coercion are passed through (note that PyUnicode_Compare does not implement the masking -- PyObject_Compare does this) - contains now works for non-string types too; TypeErrors are masked and 0 returned; all other errors are passed through Better testing support for the standard codecs. Misc minor enhancements, such as an alias dbcs for the mbcs codec. Changes: - PyLong_FromString() now applies the same error checks as does PyInt_FromString(): trailing garbage is reported as error and not longer silently ignored. The only characters which may be trailing the digits are 'L' and 'l' -- these are still silently ignored. - string.ato?() now directly interface to int(), long() and float(). The error strings are now a little different, but the type still remains the same. These functions are now ready to get declared obsolete ;-) - PyNumber_Int() now also does a check for embedded NULL chars in the input string; PyNumber_Long() already did this (and still does) Followed by: Looks like I've gone a step too far there... (and test_contains.py seem to have a bug too). I've changed back to reporting all errors in PyUnicode_Contains() and added a few more test cases to test_contains.py (plus corrected the join() NameError).	2000-04-05 20:11:21 +00:00
Guido van Rossum	68895ed70c	Marc-Andre Lemburg: use all lowercase names.	2000-03-31 17:23:18 +00:00
Guido van Rossum	24bdb0474f	Marc-Andre Lemburg: The attached patch set includes a workaround to get Python with Unicode compile on BSDI 4.x (courtesy Thomas Wouters; the cause is a bug in the BSDI wchar.h header file) and Python interfaces for the MBCS codec donated by Mark Hammond. Also included are some minor corrections w/r to the docs of the new "es" and "es#" parser markers (use PyMem_Free() instead of free(); thanks to Mark Hammond for finding these). The unicodedata tests are now in a separate file (test_unicodedata.py) to avoid problems if the module cannot be found.	2000-03-28 20:29:59 +00:00
Guido van Rossum	1abd82c07d	MBCS codecs for Windows. Contributed by Mark Hammond.	2000-03-28 01:58:50 +00:00
Barry Warsaw	51ac58039f	On 17-Mar-2000, Marc-Andre Lemburg said: Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one).	2000-03-20 16:36:48 +00:00
Guido van Rossum	0229bf6001	Marc-Andre Lemburg: Unicode encodings.	2000-03-10 23:17:24 +00:00

1 2 3 4

190 Commits