cpython

Commit Graph

Author	SHA1	Message	Date
Cody Maloney	06a1c3fb24	gh-120754: Update estimated_size in C truncate (#121357 ) Sometimes a large file is truncated (test_largefile). While estimated_size is used as a estimate (the read will stil get the number of bytes in the file), that it is much larger than the actual size of data can result in a significant over allocation and sometimes lead to a MemoryError / running out of memory. This brings the C implementation to match the Python _pyio implementation.	2024-07-04 12:59:18 +00:00
Cody Maloney	2f5f19e783	gh-120754: Reduce system calls in full-file FileIO.readall() case (#120755 ) This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY\|O_CLOEXEC) = 3` fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("*/.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG\|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>	2024-07-04 09:17:00 +02:00
Steve Dower	e731554337	Fixes loop variables to be the same types as their limit (GH-120958)	2024-06-24 17:11:47 +01:00
Serhiy Storchaka	02df679574	Use _PyLong_IsNegative instead of _PyLong_Sign if appropriate. (GH-120493) It is faster and more obvious.	2024-06-24 09:49:01 +03:00
Petr Viktorin	6f1d448bc1	gh-113993: Allow interned strings to be mortal, and fix related issues (GH-120520) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to explicitly request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>	2024-06-21 17:19:31 +02:00
Radislav Chugunov	52586f930f	gh-119506: fix `_io.TextIOWrapper.write()` write during flush (#119507 ) Co-authored-by: Inada Naoki <songofacandy@gmail.com>	2024-06-03 16:47:36 +09:00
Victor Stinner	7ca74a760a	gh-119661: Add _Py_SINGLETON() include in Argumenet Clinic (#119712 ) When the _Py_SINGLETON() is used, Argument Clinic now adds an explicit "pycore_runtime.h" include to get the macro. Previously, the macro may or may not be included indirectly by another include.	2024-05-29 11:37:04 +02:00
Brett Simmers	c2627d6eea	gh-116322: Add Py_mod_gil module slot (#116882 ) This PR adds the ability to enable the GIL if it was disabled at interpreter startup, and modifies the multi-phase module initialization path to enable the GIL when loading a module, unless that module's spec includes a slot indicating it can run safely without the GIL. PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148. A warning will be issued up to once per interpreter for the first GIL-using module that is loaded. If `-v` is given, a shorter message will be printed to stderr every time a GIL-using module is loaded (including the first one that issues a warning).	2024-05-03 11:30:55 -04:00
morotti	8fa1248685	gh-117151: optimize BufferedWriter(), do not buffer writes that are the buffer size (GH-118037) BufferedWriter() was buffering calls that are the exact same size as the buffer. it's a very common case to read/write in blocks of the exact buffer size. it's pointless to copy a full buffer, it's costing extra memory copy and the full buffer will have to be written in the next call anyway. Co-authored-by: rmorotti <romain.morotti@man.com>	2024-04-23 18:51:20 +03:00
Serhiy Storchaka	35f60c3def	gh-117764: Add signatures for __reduce__ and __reduce_ex__ in the _io module (GH-117773) __reduce__() does not have parameters, __reduce_ex__() has a single parameter.	2024-04-12 12:22:17 +03:00
NGRsoftlab	63d6f2623e	gh-117068: Remove useless code in bytesio.c:resize_buffer() (GH-117069) Co-authored-by: i.khabibulin <i.khabibulin@ngrsoftlab.ru>	2024-03-22 11:25:38 +00:00
AN Long	cd2ed91780	gh-115538: Emit warning when use bool as fd in _io.WindowsConsoleIO (GH-116925)	2024-03-18 11:48:50 +00:00
6t8k	26800cf25a	gh-95782: Fix io.BufferedReader.tell() etc. being able to return offsets < 0 (GH-99709) lseek() always returns 0 for character pseudo-devices like `/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it always returns -1, to which CPython reacts by raising appropriate exceptions). They are thus technically seekable despite not having seek semantics. When calling read() on e.g. an instance of `io.BufferedReader` that wraps such a file, `BufferedReader` reads ahead, filling its buffer, creating a discrepancy between the number of bytes read and the internal `tell()` always returning 0, which previously resulted in e.g. `BufferedReader.tell()` or `BufferedReader.seek()` being able to return positions < 0 even though these are supposed to be always >= 0. Invariably keep the return value non-negative by returning max(former_return_value, 0) instead, and add some corresponding tests.	2024-02-17 11:16:06 +00:00
Steve Dower	7861dfd26a	gh-111140: Adds PyLong_AsNativeBytes and PyLong_FromNative[Unsigned]Bytes functions (GH-114886)	2024-02-12 20:13:13 +00:00
Serhiy Storchaka	846fd721d5	gh-115059: Flush the underlying write buffer in io.BufferedRandom.read1() (GH-115163)	2024-02-09 12:36:12 +02:00
Serhiy Storchaka	652fbf88c4	gh-82626: Emit a warning when bool is used as a file descriptor (GH-111275)	2024-02-05 22:51:11 +02:00
Erlend E. Aasland	09096a1647	gh-115015: Argument Clinic: fix generated code for METH_METHOD methods without params (#115016 )	2024-02-05 21:49:17 +01:00
Nikita Sobolev	05e47202a3	gh-114286: Fix `maybe-uninitialized` warning in `Modules/_io/fileio.c` (GH-114287)	2024-01-19 10:25:05 +00:00
Jonathon Reinhart	e454f9383c	Fix an incorrect comment in iobase_is_closed (GH-102952) This comment appears to have been mistakenly copied from what is now called iobase_check_closed() in commit `4d9aec0220`. Also unite the iobase_check_closed() code with the relevant comment. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2024-01-16 18:27:17 +02:00
Victor Stinner	1d75fa43a2	gh-77046: os.pipe() sets _O_NOINHERIT flag on fds (#113817 ) On Windows, set _O_NOINHERIT flag on file descriptors created by os.pipe() and io.WindowsConsoleIO. Add test_pipe_spawnl() to test_os. Co-authored-by: Zackery Spytz <zspytz@gmail.com>	2024-01-10 23:02:17 +01:00
AN Long	623b338adf	gh-66060: Use actual class name in _io type's __repr__ (#30824 ) Use the object's actual class name in the following _io type's __repr__: - FileIO - TextIOWrapper - _WindowsConsoleIO	2024-01-09 21:39:36 +01:00
Zackery Spytz	73c9326563	gh-80109: Fix io.TextIOWrapper dropping the internal buffer during write() (GH-22535) io.TextIOWrapper was dropping the internal decoding buffer during read() and write() calls.	2024-01-08 12:33:34 +02:00
Donghee Na	57b7e52790	gh-112205: Support docstring for `@getter` (#113160 ) --------- Co-authored-by: Erlend E. Aasland <erlend@python.org>	2023-12-20 21:52:12 +09:00
Donghee Na	23a5711100	gh-112205: Update textio module to use `@getter` as possible. (gh-113095)	2023-12-14 10:26:46 +00:00
Serhiy Storchaka	bb36f72efc	gh-111049: Fix crash during garbage collection of the BytesIO buffer object (GH-111221)	2023-12-14 10:04:23 +00:00
Donghee Na	498a096a51	gh-112205: Support `@setter` annotation from AC (gh-112922) --------- Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>	2023-12-13 14:00:34 +00:00
Donghee Na	5b0629966f	gh-112205: Update stringio module to use AC for the thread-safe (gh-112549)	2023-12-01 08:37:30 +09:00
Donghee Na	7eeea13403	gh-112205: Support @getter annotation from AC (gh-112396)	2023-11-30 19:40:53 +09:00
Mayuresh Kedari	fef6fb8762	gh-111965: Use critical sections to make io.BufferedIOBase and its related classes thread safe (gh-112298)	2023-11-22 20:25:07 +09:00
AN Long	77d9f1e6d9	gh-111965: Using critical sections to make ``io.StringIO`` thread safe. (gh-112116)	2023-11-19 21:34:40 +09:00
Donghee Na	b8c952af72	gh-111903: Update AC to support "pycore_critical_section.h" header (gh-112251)	2023-11-19 10:13:58 +09:00
AN Long	1a969b4f55	gh-111965: Use critical sections to make io.TextIOWrapper thread safe (gh-112193)	2023-11-19 08:21:04 +09:00
Serhiy Storchaka	9302f05f9a	gh-111942: Fix SystemError in the TextIOWrapper constructor (#112061 ) In non-debug more the check for the "errors" argument is skipped, and then PyUnicode_AsUTF8() can fail, but its result was not checked. Co-authored-by: Victor Stinner <vstinner@python.org>	2023-11-14 20:02:28 +00:00
Serhiy Storchaka	ee06fffd38	gh-111942: Fix crashes in TextIOWrapper.reconfigure() (GH-111976) * Fix crash when encoding is not string or None. * Fix crash when both line_buffering and write_through raise exception when converted ti int. * Add a number of tests for constructor and reconfigure() method with invalid arguments.	2023-11-14 17:37:56 +02:00
Sam Gross	324531df90	gh-111903: Add `@critical_section` directive to Argument Clinic. (#111904 ) The `@critical_section` directive instructs Argument Clinic to generate calls to `Py_BEGIN_CRITICAL_SECTION()` and `Py_END_CRITICAL_SECTION()` around the bound function. In `--disable-gil` builds, these calls will lock and unlock the `self` object. They are no-ops in the default build. This is used in one place (`_io._Buffered.close`) as a demonstration. Subsequent PRs will use it more widely in the `_io.Buffered` bindings.	2023-11-14 10:47:46 +00:00
Serhiy Storchaka	771bd3c94a	Add private _PyUnicode_AsUTF8NoNUL() function (GH-111957) Like PyUnicode_AsUTF8(), but check for embedded null characters.	2023-11-10 21:31:36 +02:00
Victor Stinner	11e83488c5	gh-111089: Revert PyUnicode_AsUTF8() changes (#111833 ) * Revert "gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585)" This reverts commit `d9b606b3d0`. * Revert "gh-111089: Use PyUnicode_AsUTF8() in getargs.c (#111620)" This reverts commit `cde1071b2a`. * Revert "gh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (#111091)" This reverts commit `d731579bfb`. * Revert "gh-111089: Add PyUnicode_AsUTF8() to the limited C API (#111121)" This reverts commit `d8f32be5b6`. * Revert "gh-111089: Use PyUnicode_AsUTF8() in sqlite3 (#111122)" This reverts commit `37e4e20eaa`.	2023-11-07 22:36:13 +00:00
Victor Stinner	d9b606b3d0	gh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (#111585 ) Replace PyUnicode_AsUTF8AndSize() with PyUnicode_AsUTF8() to remove the explicit check for embedded null characters. The change avoids to have to include explicitly <string.h> to get the strlen() function when using a recent version of the limited C API.	2023-11-01 16:34:42 +01:00
Serhiy Storchaka	9da98c0d9a	gh-111174: Fix crash in getbuffer() called repeatedly for empty BytesIO (GH-111210)	2023-10-25 13:50:16 +03:00
Furkan Onder	32c37fe1ba	gh-67565: Remove redundant C-contiguity checks (GH-105521) Co-authored-by: Stefan Krah <skrah@bytereef.org>	2023-10-23 12:54:46 +03:00
Tamás Hegedűs	11312eae6e	gh-110913: Fix WindowsConsoleIO chunking of UTF-8 text (GH-111007)	2023-10-20 12:52:31 +01:00
Victor Stinner	be5e8a0103	gh-110964: Remove private _PyArg functions (#110966 ) Move the following private functions and structures to pycore_modsupport.h internal C API: * _PyArg_BadArgument() * _PyArg_CheckPositional() * _PyArg_NoKeywords() * _PyArg_NoPositional() * _PyArg_ParseStack() * _PyArg_ParseStackAndKeywords() * _PyArg_Parser structure * _PyArg_UnpackKeywords() * _PyArg_UnpackKeywordsWithVararg() * _PyArg_UnpackStack() * _Py_ANY_VARARGS() Changes: * Python/getargs.h now includes pycore_modsupport.h to export functions. * clinic.py now adds pycore_modsupport.h when one of these functions is used. * Add pycore_modsupport.h includes when a C extension uses one of these functions. * Define Py_BUILD_CORE_MODULE in C extensions which now include directly or indirectly (via code generated by Argument Clinic) pycore_modsupport.h: * _csv * _curses_panel * _dbm * _gdbm * _multiprocessing.posixshmem * _sqlite.row * _statistics * grp * resource * syslog * _testcapi: bad_get() no longer uses METH_FASTCALL calling convention but METH_VARARGS. Replace _PyArg_UnpackStack() with PyArg_ParseTuple(). * _testcapi: add PYTESTCAPI_NEED_INTERNAL_API macro which is defined by _testcapi sub-modules which need the internal C API (pycore_modsupport.h): exceptions.c, float.c, vectorcall.c, watchers.c. * Remove Include/cpython/modsupport.h header file. Include/modsupport.h no longer includes the removed header file. * Fix mypy clinic.py	2023-10-17 14:30:31 +02:00
Victor Stinner	7513994c92	gh-110014: Include explicitly <unistd.h> header (#110155 ) * Remove unused <locale.h> includes. * Remove unused <fcntl.h> include in traceback.h. * Remove redundant <assert.h> and <stddef.h> includes. They are already included by "Python.h". * Remove <object.h> include in faulthandler.c. Python.h already includes it. * Add missing <stdbool.h> in pycore_pythread.h if HAVE_PTHREAD_STUBS is defined. * Fix also warnings in pthread_stubs.h: don't redefine macros if they are already defined, like the __NEED_pthread_t macro.	2023-09-30 20:06:45 +00:00
Serhiy Storchaka	b8d1744e7b	gh-109611: Add convenient C API function _PyFile_Flush() (GH-109612)	2023-09-23 09:35:30 +03:00
Serhiy Storchaka	add16f1a5e	gh-108511: Add C API functions which do not silently ignore errors (GH-109025) Add the following functions: * PyObject_HasAttrWithError() * PyObject_HasAttrStringWithError() * PyMapping_HasKeyWithError() * PyMapping_HasKeyStringWithError()	2023-09-17 14:23:31 +03:00
Serhiy Storchaka	1796c191b4	gh-108494: Argument Clinic: inline parsing code for positional-only parameters in the limited C API (GH-108622)	2023-09-03 17:28:14 +03:00
Victor Stinner	3edcf743e8	gh-106320: Remove private _PyLong_Sign() (#108743 ) Move the private _PyLong_Sign() and _PyLong_NumBits() functions to the internal C API (pycore_long.h). Modules/_testcapi/long.c now uses the internal C API.	2023-09-01 09:13:07 +02:00
Victor Stinner	ad73674283	gh-107603: Argument Clinic: Only include pycore_gc.h if needed (#108726 ) Argument Clinic now only includes pycore_gc.h if PyGC_Head is needed, and only includes pycore_runtime.h if _Py_ID() is needed. * Add 'condition' optional argument to Clinic.add_include(). * deprecate_keyword_use() includes pycore_runtime.h when using the _PyID() function. * Fix rendering of includes: comments start at the column 35. * Mark PC/clinic/_wmimodule.cpp.h and "Objects/stringlib/clinic/.h.h" header files as generated in .gitattributes. Effects: 42 header files generated by AC no longer include the internal C API, instead of 4 header files before. For example, Modules/clinic/_abc.c.h no longer includes the internal C API. * Fix _testclinic_depr.c.h: it now always includes pycore_runtime.h to get _Py_ID().	2023-08-31 23:42:34 +02:00
Victor Stinner	dd32611f4f	gh-106320: winconsoleio.c includes pycore_pyerrors.h (#108720 ) Fix compiler warning: warning C4013: '_PyErr_ChainExceptions1' undefined	2023-08-31 14:13:53 +00:00
Victor Stinner	79823c103b	gh-106320: Remove private _PyErr_ChainExceptions() (#108713 ) Remove _PyErr_ChainExceptions(), _PyErr_ChainExceptions1() and _PyErr_SetFromPyStatus() functions from the public C API. * Move the private _PyErr_ChainExceptions() and _PyErr_ChainExceptions1() function to the internal C API (pycore_pyerrors.h). * Move the private _PyErr_SetFromPyStatus() to the internal C API (pycore_initconfig.h). * No longer export the _PyErr_ChainExceptions() function. * Move run_in_subinterp_with_config() from _testcapi to _testinternalcapi.	2023-08-31 13:53:19 +02:00

1 2 3 4 5 ...

726 Commits