Commit Graph

214 Commits

Author SHA1 Message Date
Marc-André Lemburg 5a5c81a0e9 Added new API PyUnicode_FromEncodedObject() which supports decoding
objects including instance objects.

The old API PyUnicode_FromObject() is still available as shortcut.
2000-07-07 13:46:42 +00:00
Marc-André Lemburg 43279100f4 Bill Tutt: Added Py_UCS4 typedef to hold UCS4 values (these need
at least 32 bits as opposed to Py_UNICODE which rely on having
16 bits).
2000-07-07 09:01:41 +00:00
Marc-André Lemburg f03e74126e Modified the ISALPHA and ISALNUM macros to use the new lookup APIs
from unicodectype.c
2000-07-05 09:45:59 +00:00
Marc-André Lemburg a9c103bc09 Added new Py_UNICODE_ISALPHA() and Py_UNICODE_ISALNUM() macros
which are true for alphabetic and alphanumeric characters resp.

The macros are currently implemented using the existing is* tables
but will have to be updated to meet the Unicode standard definitions
(add tables for non-cased letters and letter modifiers).
2000-07-03 10:52:13 +00:00
Marc-André Lemburg 2f4d0e9bb6 Marc-Andre Lemburg <mal@lemburg.com>:
Added optimization proposed by Andrew Kuchling to the Unicode
matching macro.
2000-06-18 22:22:27 +00:00
Fred Drake cb093fe890 M.-A. Lemburg <mal@lemburg.com>:
Added PyUnicode_GetDefaultEncoding() and
PyUnicode_GetDefaultEncoding() APIs.
2000-05-09 19:51:53 +00:00
Guido van Rossum 004d64f362 Marc-Andre Lemburg:
Changed PyUnicode_Splitlines() maxsplit argument to keepends.
The maxsplit functionality was replaced by the keepends
functionality which allows keeping the line end markers together
with the string.
2000-04-11 15:39:46 +00:00
Guido van Rossum 52c2359a59 Marc-Andre Lemburg: New exported API PyUnicode_Resize(). 2000-04-10 13:41:41 +00:00
Guido van Rossum 9e896b37c7 Marc-Andre's third try at this bulk patch seems to work (except that
his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent).  Checkin messages:


New Unicode support for int(), float(), complex() and long().

- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
  Unicode to a decimal char* string (used in the above new
  APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above

Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
  are masked, all other errors such as ValueError during
  Unicode coercion are passed through (note that PyUnicode_Compare
  does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
  masked and 0 returned; all other errors are passed through

Better testing support for the standard codecs.

Misc minor enhancements, such as an alias dbcs for the mbcs codec.

Changes:
- PyLong_FromString() now applies the same error checks as
  does PyInt_FromString(): trailing garbage is reported
  as error and not longer silently ignored. The only characters
  which may be trailing the digits are 'L' and 'l' -- these
  are still silently ignored.
- string.ato?() now directly interface to int(), long() and
  float(). The error strings are now a little different, but
  the type still remains the same. These functions are now
  ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
  in the input string; PyNumber_Long() already did this (and
  still does)

Followed by:

Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).

I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
2000-04-05 20:11:21 +00:00
Guido van Rossum 24bdb0474f Marc-Andre Lemburg:
The attached patch set includes a workaround to get Python with
Unicode compile on BSDI 4.x (courtesy Thomas Wouters; the cause
is a bug in the BSDI wchar.h header file) and Python interfaces
for the MBCS codec donated by Mark Hammond.

Also included are some minor corrections w/r to the docs of
the new "es" and "es#" parser markers (use PyMem_Free() instead
of free(); thanks to Mark Hammond for finding these).

The unicodedata tests are now in a separate file
(test_unicodedata.py) to avoid problems if the module cannot
be found.
2000-03-28 20:29:59 +00:00
Guido van Rossum efec1158c1 Prototypes added for MBCS codecs. (Win32 only.) 2000-03-28 02:01:15 +00:00
Barry Warsaw 51ac58039f On 17-Mar-2000, Marc-Andre Lemburg said:
Attached you find an update of the Unicode implementation.

    The patch is against the current CVS version. I would appreciate
    if someone with CVS checkin permissions could check the changes
    in.

    The patch contains all bugs and patches sent this week and also
    fixes a leak in the codecs code and a bug in the free list code
    for Unicode objects (which only shows up when compiling Python
    with Py_DEBUG; thanks to MarkH for spotting this one).
2000-03-20 16:36:48 +00:00
Guido van Rossum d0d366b5e6 Marc-Andre Lemburg: add declaration for PyUnicode_Contains(). 2000-03-13 23:22:24 +00:00
Guido van Rossum d822518fa8 Unicode implementation by Marc-Andre Lemburg based on original code by Fredrik Lundh. 2000-03-10 22:33:05 +00:00