cpython/Objects/stringlib
Victor Stinner 0518edc170
gh-119396: Optimize unicode_repr() (#119617)
Use stringlib to specialize unicode_repr() for each string kind
(UCS1, UCS2, UCS4).

Benchmark:

+-------------------------------------+---------+----------------------+
| Benchmark                           | ref     | change2              |
+=====================================+=========+======================+
| repr('abc')                         | 100 ns  | 103 ns: 1.02x slower |
+-------------------------------------+---------+----------------------+
| repr('a' * 100)                     | 369 ns  | 369 ns: 1.00x slower |
+-------------------------------------+---------+----------------------+
| repr(('a' + squote) * 100)          | 1.21 us | 946 ns: 1.27x faster |
+-------------------------------------+---------+----------------------+
| repr(('a' + nl) * 100)              | 1.23 us | 907 ns: 1.36x faster |
+-------------------------------------+---------+----------------------+
| repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster |
+-------------------------------------+---------+----------------------+
| Geometric mean                      | (ref)   | 1.16x faster         |
+-------------------------------------+---------+----------------------+
2024-05-28 18:05:20 +02:00
..
clinic gh-117557: Improve error messages when a string, bytes or bytearray of length 1 are expected (GH-117631) 2024-05-28 12:01:37 +03:00
README.txt gh-105156: Cleanup usage of old Py_UNICODE type (#105158) 2023-06-01 07:18:09 +00:00
asciilib.h gh-93033: Use wmemchr in stringlib (GH-93034) 2022-05-24 10:45:31 +09:00
codecs.h gh-92536: Remove PyUnicode_READY() calls (#105210) 2023-06-02 01:33:17 +02:00
count.h gh-97982: Remove asciilib_count() (#98164) 2022-10-11 17:59:58 +02:00
ctype.h bpo-35081: Move bytes_methods.h to the internal C API (GH-18492) 2020-02-12 22:32:34 +01:00
eq.h gh-89653: PEP 670: Convert PyUnicode_KIND() macro to function (#92705) 2022-05-13 11:49:56 +02:00
fastsearch.h gh-94808: improve comments and coverage of fastsearch.h (GH-96760) 2022-09-13 14:25:10 -04:00
find.h gh-117431: Adapt bytes and bytearray .find() and friends to Argument Clinic (#117502) 2024-04-12 07:40:55 +00:00
find_max_char.h bpo-43179: Generalise alignment for optimised string routines (GH-24624) 2021-03-31 12:12:39 +02:00
join.h gh-99300: Use Py_NewRef() in Objects/ directory (#99354) 2022-11-10 23:58:07 +01:00
localeutil.h gh-89653: Use int type for Unicode kind (#92704) 2022-05-13 12:41:05 +02:00
partition.h bpo-40521: Make empty Unicode string per interpreter (GH-21096) 2020-06-24 00:10:40 +02:00
replace.h gh-93033: Use wmemchr in stringlib (GH-93034) 2022-05-24 10:45:31 +09:00
repr.h gh-119396: Optimize unicode_repr() (#119617) 2024-05-28 18:05:20 +02:00
split.h bpo-46670: Define all macros for stringlib (GH-31176) 2022-02-07 01:26:58 +01:00
stringdefs.h gh-93033: Use wmemchr in stringlib (GH-93034) 2022-05-24 10:45:31 +09:00
stringlib_find_two_way_notes.txt gh-94808: improve comments and coverage of fastsearch.h (GH-96760) 2022-09-13 14:25:10 -04:00
transmogrify.h gh-99300: Use Py_NewRef() in Objects/ directory (#99354) 2022-11-10 23:58:07 +01:00
ucs1lib.h gh-93033: Use wmemchr in stringlib (GH-93034) 2022-05-24 10:45:31 +09:00
ucs2lib.h gh-93033: Use wmemchr in stringlib (GH-93034) 2022-05-24 10:45:31 +09:00
ucs4lib.h gh-93033: Use wmemchr in stringlib (GH-93034) 2022-05-24 10:45:31 +09:00
undef.h gh-93033: Use wmemchr in stringlib (GH-93034) 2022-05-24 10:45:31 +09:00
unicode_format.h gh-106320: Add pycore_complexobject.h header file (#106339) 2023-07-02 21:19:59 +00:00

README.txt

bits shared by the bytesobject and unicodeobject implementations (and
possibly other modules, in a not too distant future).

the stuff in here is included into relevant places; see the individual
source files for details.

--------------------------------------------------------------------
the following defines used by the different modules:

STRINGLIB_CHAR

    the type used to hold a character (char, Py_UCS1, Py_UCS2 or Py_UCS4)

STRINGLIB_GET_EMPTY()

    returns a PyObject representing the empty string, only to be used if
    STRINGLIB_MUTABLE is 0. It must not be NULL.

Py_ssize_t STRINGLIB_LEN(PyObject*)

    returns the length of the given string object (which must be of the
    right type)

PyObject* STRINGLIB_NEW(STRINGLIB_CHAR*, Py_ssize_t)

    creates a new string object

STRINGLIB_CHAR* STRINGLIB_STR(PyObject*)

    returns the pointer to the character data for the given string
    object (which must be of the right type)

int STRINGLIB_CHECK_EXACT(PyObject *)

    returns true if the object is an instance of our type, not a subclass

STRINGLIB_MUTABLE

    must be 0 or 1 to tell the cpp macros in stringlib code if the object
    being operated on is mutable or not