Commit Graph

278 Commits

Author SHA1 Message Date
Dennis Sweeney 69d9a08099
gh-94808: improve comments and coverage of fastsearch.h (GH-96760) 2022-09-13 14:25:10 -04:00
Erlend E. Aasland f07adf82f3
gh-90928: Improve static initialization of keywords tuple in AC (#95907) 2022-08-13 12:09:40 +02:00
Eric Snow 6f6a4e6cc5
gh-90928: Statically Initialize the Keywords Tuple in Clinic-Generated Code (gh-95860)
We only statically initialize for core code and builtin modules.  Extension modules still create
the tuple at runtime.  We'll solve that part of interpreter isolation separately.

This change includes generated code. The non-generated changes are in:

* Tools/clinic/clinic.py
* Python/getargs.c
* Include/cpython/modsupport.h
* Makefile.pre.in (re-generate global strings after running clinic)
* very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c

All other changes are generated code (clinic, global strings).
2022-08-11 15:25:49 -06:00
Christian Heimes 774ef28814
gh-84461: Silence some compiler warnings on WASM (GH-93978) 2022-06-20 13:34:40 +02:00
goldsteinn 7108bdf27c
gh-93033: Use wmemchr in stringlib (GH-93034)
Generally comparable perf for the "good" case where memchr doesn't
return any collisions (false matches on lower byte) but clearly faster
with collisions.
2022-05-24 10:45:31 +09:00
Victor Stinner f62ad4f2c4
gh-89653: Use int type for Unicode kind (#92704)
Use the same type that PyUnicode_FromKindAndData() kind parameter
type (public C API): int.
2022-05-13 12:41:05 +02:00
Victor Stinner db388df1d9
gh-89653: PEP 670: Convert PyUnicode_KIND() macro to function (#92705)
In the limited C API version 3.12, PyUnicode_KIND() is now
implemented as a static inline function. Keep the macro for the
regular C API and for the limited C API version 3.11 and older to
prevent introducing new compiler warnings.

Update _decimal.c and stringlib/eq.h for PyUnicode_KIND().
2022-05-13 11:49:56 +02:00
Inada Naoki f9c9354a7a
gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537) 2022-05-12 14:48:38 +09:00
Victor Stinner b270b82f11
gh-91320: Argument Clinic uses _PyCFunction_CAST() (#32210)
Replace "(PyCFunction)(void(*)(void))func" cast with
_PyCFunction_CAST(func).
2022-05-03 20:25:41 +02:00
Serhiy Storchaka 18b07d773e
bpo-36819: Fix crashes in built-in encoders with weird error handlers (GH-28593)
If the error handler returns position less or equal than the starting
position of non-encodable characters, most of built-in encoders didn't
properly re-size the output buffer. This led to out-of-bounds writes,
and segfaults.
2022-05-02 12:37:48 +03:00
Victor Stinner 097f74a5a3
bpo-46670: Define all macros for stringlib (GH-31176)
bytesobject.c, bytearrayobject.c and unicodeobject.c now define all
macros used by stringlib, to avoid using undefined macros.
Fix "gcc -Wundef" warnings.
2022-02-07 01:26:58 +01:00
Victor Stinner 0a883a76cd
bpo-35134: Add Include/cpython/floatobject.h (GH-28957)
Split Include/floatobject.h into sub-files: add
Include/cpython/floatobject.h and
Include/internal/pycore_floatobject.h.
2021-10-14 23:41:06 +02:00
Dennis Sweeney d01dceb88b
bpo-41972: Tweak fastsearch.h string search algorithms (GH-27091) 2021-07-19 12:58:32 +02:00
E-Paine e9f66aedf4
Remove effbot urls (GH-26308) 2021-05-22 14:09:54 +02:00
Jessica Clarke dec0757549
bpo-43179: Generalise alignment for optimised string routines (GH-24624)
* Remove m68k-specific hack from ascii_decode

On m68k, alignments of primitives is more relaxed, with 4-byte and
8-byte types only requiring 2-byte alignment, thus using sizeof(size_t)
does not work. Instead, use the portable alternative.

Note that this is a minimal fix that only relaxes the assertion and the
condition for when to use the optimised version remains overly strict.
Such issues will be fixed tree-wide in the next commit.

NB: In C11 we could use _Alignof(size_t) instead, but for compatibility
we use autoconf.

* Optimise string routines for architectures with non-natural alignment

C only requires that sizeof(x) is a multiple of alignof(x), not that the
two are equal. Thus anywhere where we optimise based on alignment we
should be using alignof(x) not sizeof(x).

This is more annoying than it would be in C11 where we could just use
_Alignof(x) (and alignof(x) in C++11), but since we still require only
C99 we must plumb the information all the way from autoconf through the
various typedefs and defines.
2021-03-31 12:12:39 +02:00
Dennis Sweeney 73a85c4e1d
bpo-41972: Use the two-way algorithm for string searching (GH-22904)
Implement an enhanced variant of Crochemore and Perrin's Two-Way string searching algorithm, which reduces worst-case time from quadratic (the product of the string and pattern lengths) to linear. This applies to forward searches (like``find``, ``index``, ``replace``); the algorithm for reverse searches (like ``rfind``) is not changed.

Co-authored-by: Tim Peters <tim.peters@gmail.com>
2021-02-28 12:20:50 -06:00
Victor Stinner 32bd68c839
bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)
No longer use deprecated aliases to functions:

* Replace PyObject_MALLOC() with PyObject_Malloc()
* Replace PyObject_REALLOC() with PyObject_Realloc()
* Replace PyObject_FREE() with PyObject_Free()
* Replace PyObject_Del() with PyObject_Free()
* Replace PyObject_DEL() with PyObject_Free()
2020-12-01 10:37:39 +01:00
Victor Stinner 00d7abd7ef
bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)
No longer use deprecated aliases to functions:

* Replace PyMem_MALLOC() with PyMem_Malloc()
* Replace PyMem_REALLOC() with PyMem_Realloc()
* Replace PyMem_FREE() with PyMem_Free()
* Replace PyMem_Del() with PyMem_Free()
* Replace PyMem_DEL() with PyMem_Free()

Modify also the PyMem_DEL() macro to use directly PyMem_Free().
2020-12-01 09:56:42 +01:00
Ma Lin a0c603cb9d
bpo-38252: Use 8-byte step to detect ASCII sequence in 64bit Windows build (GH-16334) 2020-10-18 17:48:38 +03:00
Victor Stinner f363d0a6e9
bpo-40521: Make empty Unicode string per interpreter (GH-21096)
Each interpreter now has its own empty Unicode string singleton.
2020-06-24 00:10:40 +02:00
Victor Stinner c41eed1a87
bpo-40521: Make bytes singletons per interpreter (GH-21074)
Each interpreter now has its own empty bytes string and single byte
character singletons.

Replace STRINGLIB_EMPTY macro with STRINGLIB_GET_EMPTY() macro.
2020-06-23 15:54:35 +02:00
Victor Stinner c6b292cdee
bpo-29882: Add _Py_popcount32() function (GH-20518)
* Rename pycore_byteswap.h to pycore_bitutils.h.
* Move popcount_digit() to pycore_bitutils.h as _Py_popcount32().
* _Py_popcount32() uses GCC and clang builtin function if available.
* Add unit tests to _Py_popcount32().
2020-06-08 16:30:33 +02:00
Serhiy Storchaka 5f4b229df7
bpo-40792: Make the result of PyNumber_Index() always having exact type int. (GH-20443)
Previously, the result could have been an instance of a subclass of int.

Also revert bpo-26202 and make attributes start, stop and step of the range
object having exact type int.

Add private function _PyNumber_Index() which preserves the old behavior
of PyNumber_Index() for performance to use it in the conversion functions
like PyLong_AsLong().
2020-05-28 10:33:45 +03:00
Serhiy Storchaka 578c3955e0
bpo-37999: No longer use __int__ in implicit integer conversions. (GH-15636)
Only __index__ should be used to make integer conversions lossless.
2020-05-26 18:43:38 +03:00
Victor Stinner d7c657d4b1
bpo-40302: UTF-32 encoder SWAB4() macro use a|b rather than a+b (GH-19572) 2020-04-17 19:13:34 +02:00
Victor Stinner 1ae035b7e8
bpo-40302: Add pycore_byteswap.h header file (GH-19552)
Add a new internal pycore_byteswap.h header file with the following
functions:

* _Py_bswap16()
* _Py_bswap32()
* _Py_bswap64()

Use these functions in _ctypes, sha256 and sha512 modules,
and also use in the UTF-32 encoder.

sha256, sha512 and _ctypes modules are now built with the internal
C API.
2020-04-17 17:47:20 +02:00
Serhiy Storchaka 8f87eefe7f
bpo-39943: Add the const qualifier to pointers on non-mutable PyBytes data. (GH-19472) 2020-04-12 14:58:27 +03:00
Serhiy Storchaka cd8295ff75
bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode data. (GH-19345) 2020-04-11 10:48:40 +03:00
Benjamin Peterson 51796e5d26
Update some www.unicode.org URLs to use HTTPS. (GH-18912) 2020-03-10 21:10:59 -07:00
Serhiy Storchaka eebaa9bfc5
bpo-38249: Expand Py_UNREACHABLE() to __builtin_unreachable() in the release mode. (GH-16329)
Co-authored-by: Victor Stinner <vstinner@python.org>
2020-03-09 20:49:52 +02:00
Inada Naoki 02a4d57263
bpo-39087: Optimize PyUnicode_AsUTF8AndSize() (GH-18327)
Avoid using temporary bytes object.
2020-02-27 13:48:59 +09:00
Victor Stinner 45876a90e2
bpo-35081: Move bytes_methods.h to the internal C API (GH-18492)
Move the bytes_methods.h header file to the internal C API as
pycore_bytes_methods.h: it only contains private symbols (prefixed by
"_Py"), except of the PyDoc_STRVAR_shared() macro.
2020-02-12 22:32:34 +01:00
Andy Lester e6be9b59a9
closes bpo-39605: Fix some casts to not cast away const. (GH-18453)
gcc -Wcast-qual turns up a number of instances of casting away constness of pointers. Some of these can be safely modified, by either:

Adding the const to the type cast, as in:

-    return _PyUnicode_FromUCS1((unsigned char*)s, size);
+    return _PyUnicode_FromUCS1((const unsigned char*)s, size);

or, Removing the cast entirely, because it's not necessary (but probably was at one time), as in:

-    PyDTrace_FUNCTION_ENTRY((char *)filename, (char *)funcname, lineno);
+    PyDTrace_FUNCTION_ENTRY(filename, funcname, lineno);

These changes will not change code, but they will make it much easier to check for errors in consts
2020-02-11 18:28:35 -08:00
Victor Stinner 60ac6ed557
bpo-39573: Use Py_SET_SIZE() function (GH-18402)
Replace direct acccess to PyVarObject.ob_size with usage of
the Py_SET_SIZE() function.
2020-02-07 23:18:08 +01:00
Inada Naoki 869c0c99b9
bpo-36051: Fix compiler warning. (GH-18325) 2020-02-03 19:03:34 +09:00
Bruce Merry d07d9f4c43
bpo-36051: Drop GIL during large bytes.join() (GH-17757)
Improve multi-threaded performance by dropping the GIL in the fast path
of bytes.join. To avoid increasing overhead for small joins, it is only
done if the output size exceeds a threshold.
2020-01-29 16:09:24 +09:00
Pablo Galindo cd7db76a63
bpo-39372: Clean header files of declared interfaces with no implementations (GH-18037)
The public API symbols being removed are:

_PyBytes_InsertThousandsGroupingLocale, _PyBytes_InsertThousandsGrouping, _Py_InitializeFromArgs, _Py_InitializeFromWideArgs, _PyFloat_Repr, _PyFloat_Digits,
_PyFloat_DigitsInit, PyFrame_ExtendStack, _PyAIterWrapper_Type, PyNullImporter_Type, PyCmpWrapper_Type, PySortWrapper_Type, PyNoArgsFunction.
2020-01-18 03:14:59 +00:00
Serhiy Storchaka 865c3b257f
bpo-28029: Make "".replace("", s, n) returning s for any n != 0. (GH-16981) 2019-10-30 12:03:53 +02:00
Valentin Haenel 60bba83b5d Doc: Fix typo in fastsearch comments (GH-14608) 2019-09-11 14:43:29 +02:00
Rémi Lapeyre 4901fe274b bpo-37034: Display argument name on errors with keyword arguments with Argument Clinic. (GH-13593) 2019-08-29 17:49:08 +03:00
Min ho Kim c4cacc8c5e Fix typos in comments, docs and test names (#15018)
* Fix typos in comments, docs and test names

* Update test_pyparse.py

account for change in string length

* Apply suggestion: splitable -> splittable

Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>

* Apply suggestion: splitable -> splittable

Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>

* Apply suggestion: Dealloccte -> Deallocate

Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>

* Update posixmodule checksum.

* Reverse idlelib changes.
2019-07-30 18:16:13 -04:00
Serhiy Storchaka 894263ba80
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
2019-06-25 11:54:18 +03:00
Francisco Couzo 9843bc110d Improve exception message for str.format (GH-12675) 2019-06-01 10:14:00 -07:00
Jeroen Demeyer 530f506ac9 bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async (GH-13464)
Automatically replace
tp_print -> tp_vectorcall_offset
tp_compare -> tp_as_async
tp_reserved -> tp_as_async
2019-05-30 19:13:39 -07:00
David Carlier 27ee0f8551 Fix couple of dead code paths (GH-7418) 2019-05-17 19:46:22 -04:00
Victor Stinner 709d23dee6
bpo-36775: _PyCoreConfig only uses wchar_t* (GH-13062)
_PyCoreConfig: Change filesystem_encoding, filesystem_errors,
stdio_encoding and stdio_errors fields type from char* to wchar_t*.

Changes:

* PyInterpreterState: replace fscodec_initialized (int) with fs_codec
  structure.
* Add get_error_handler_wide() and unicode_encode_utf8() helper
  functions.
* Add error_handler parameter to unicode_encode_locale()
  and unicode_decode_locale().
* Remove _PyCoreConfig_SetString().
* Rename _PyCoreConfig_SetWideString() to _PyCoreConfig_SetString().
* Rename _PyCoreConfig_SetWideStringFromString()
  to _PyCoreConfig_DecodeLocale().
2019-05-02 14:56:30 -04:00
Serhiy Storchaka 3191391515
bpo-36127: Argument Clinic: inline parsing code for keyword parameters. (GH-12058) 2019-03-14 10:32:22 +02:00
Serhiy Storchaka 4fa9591025
bpo-35582: Argument Clinic: inline parsing code for positional parameters. (GH-11313) 2019-01-11 16:01:14 +02:00
Serhiy Storchaka 32d96a2b5b
bpo-23867: Argument Clinic: inline parsing code for a single positional parameter. (GH-9689) 2018-12-25 13:23:47 +02:00
Serhiy Storchaka 4a934d490f
bpo-33012: Fix invalid function cast warnings with gcc 8 in Argument Clinic. (GH-6748)
Fix invalid function cast warnings with gcc 8
for method conventions different from METH_NOARGS, METH_O and
METH_VARARGS in Argument Clinic generated code.
2018-11-27 11:27:36 +02:00