cpython

Commit Graph

Author	SHA1	Message	Date
Cody Maloney	cc9b9bebb2	gh-90102: Remove isatty call during regular open (#124922 ) Co-authored-by: Victor Stinner <vstinner@python.org>	2024-10-08 08:50:42 +02:00
Victor Stinner	d8f707420b	gh-111178: Fix function signatures in fileio.c (#125043 ) * Add "fileio_" prefix to getter functions. * Small refactoring.	2024-10-07 15:27:36 +02:00
Victor Stinner	43cd7aa8cd	gh-120754: Fix memory leak in FileIO.__init__() (#124225 ) Free 'self->stat_atopen' before assigning it, since io.FileIO.__init__() can be called multiple times manually (especially by test_io).	2024-09-19 00:11:50 +02:00
Cody Maloney	8b6c7c7877	gh-120754: Refactor I/O modules to stash whole stat result rather than individual members (#123412 ) Multiple places in the I/O stack optimize common cases by using the information from stat. Currently individual members are extracted from the stat and stored into the fileio struct. Refactor the code to store the whole stat struct instead. Parallels the changes to _io. The `stat` Python object doesn't allow changing members, so rather than modifying estimated_size, just clear the value.	2024-09-18 17:47:57 +02:00
Cody Maloney	06a1c3fb24	gh-120754: Update estimated_size in C truncate (#121357 ) Sometimes a large file is truncated (test_largefile). While estimated_size is used as a estimate (the read will stil get the number of bytes in the file), that it is much larger than the actual size of data can result in a significant over allocation and sometimes lead to a MemoryError / running out of memory. This brings the C implementation to match the Python _pyio implementation.	2024-07-04 12:59:18 +00:00
Cody Maloney	2f5f19e783	gh-120754: Reduce system calls in full-file FileIO.readall() case (#120755 ) This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY\|O_CLOEXEC) = 3` fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("*/.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG\|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY\|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG\|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>	2024-07-04 09:17:00 +02:00
Serhiy Storchaka	35f60c3def	gh-117764: Add signatures for __reduce__ and __reduce_ex__ in the _io module (GH-117773) __reduce__() does not have parameters, __reduce_ex__() has a single parameter.	2024-04-12 12:22:17 +03:00
Serhiy Storchaka	652fbf88c4	gh-82626: Emit a warning when bool is used as a file descriptor (GH-111275)	2024-02-05 22:51:11 +02:00
Nikita Sobolev	05e47202a3	gh-114286: Fix `maybe-uninitialized` warning in `Modules/_io/fileio.c` (GH-114287)	2024-01-19 10:25:05 +00:00
AN Long	623b338adf	gh-66060: Use actual class name in _io type's __repr__ (#30824 ) Use the object's actual class name in the following _io type's __repr__: - FileIO - TextIOWrapper - _WindowsConsoleIO	2024-01-09 21:39:36 +01:00
Victor Stinner	7513994c92	gh-110014: Include explicitly <unistd.h> header (#110155 ) * Remove unused <locale.h> includes. * Remove unused <fcntl.h> include in traceback.h. * Remove redundant <assert.h> and <stddef.h> includes. They are already included by "Python.h". * Remove <object.h> include in faulthandler.c. Python.h already includes it. * Add missing <stdbool.h> in pycore_pythread.h if HAVE_PTHREAD_STUBS is defined. * Fix also warnings in pthread_stubs.h: don't redefine macros if they are already defined, like the __NEED_pthread_t macro.	2023-09-30 20:06:45 +00:00
Victor Stinner	79823c103b	gh-106320: Remove private _PyErr_ChainExceptions() (#108713 ) Remove _PyErr_ChainExceptions(), _PyErr_ChainExceptions1() and _PyErr_SetFromPyStatus() functions from the public C API. * Move the private _PyErr_ChainExceptions() and _PyErr_ChainExceptions1() function to the internal C API (pycore_pyerrors.h). * Move the private _PyErr_SetFromPyStatus() to the internal C API (pycore_initconfig.h). * No longer export the _PyErr_ChainExceptions() function. * Move run_in_subinterp_with_config() from _testcapi to _testinternalcapi.	2023-08-31 13:53:19 +02:00
Serhiy Storchaka	2b15536fa9	gh-107913: Fix possible losses of OSError error codes (GH-107930) Functions like PyErr_SetFromErrno() and SetFromWindowsErr() should be called immediately after using the C API which sets errno or the Windows error code.	2023-08-27 00:35:06 +03:00
Victor Stinner	b32d4cad15	gh-108444: Replace _PyLong_AsInt() with PyLong_AsInt() (#108459 ) Change generated by the command: sed -i -e 's!_PyLong_AsInt!PyLong_AsInt!g' \ $(find -name ".c" -o -name ".h")	2023-08-25 01:01:30 +02:00
Victor Stinner	1a3faba9f1	gh-106869: Use new PyMemberDef constant names (#106871 ) * Remove '#include "structmember.h"'. * If needed, add <stddef.h> to get offsetof() function. * Update Parser/asdl_c.py to regenerate Python/Python-ast.c. * Replace: * T_SHORT => Py_T_SHORT * T_INT => Py_T_INT * T_LONG => Py_T_LONG * T_FLOAT => Py_T_FLOAT * T_DOUBLE => Py_T_DOUBLE * T_STRING => Py_T_STRING * T_OBJECT => _Py_T_OBJECT * T_CHAR => Py_T_CHAR * T_BYTE => Py_T_BYTE * T_UBYTE => Py_T_UBYTE * T_USHORT => Py_T_USHORT * T_UINT => Py_T_UINT * T_ULONG => Py_T_ULONG * T_STRING_INPLACE => Py_T_STRING_INPLACE * T_BOOL => Py_T_BOOL * T_OBJECT_EX => Py_T_OBJECT_EX * T_LONGLONG => Py_T_LONGLONG * T_ULONGLONG => Py_T_ULONGLONG * T_PYSSIZET => Py_T_PYSSIZET * T_NONE => _Py_T_NONE * READONLY => Py_READONLY * PY_AUDIT_READ => Py_AUDIT_READ * READ_RESTRICTED => Py_AUDIT_READ * PY_WRITE_RESTRICTED => _Py_WRITE_RESTRICTED * RESTRICTED => (READ_RESTRICTED \| _Py_WRITE_RESTRICTED)	2023-07-25 15:28:30 +02:00
Serhiy Storchaka	be1b968dc1	gh-106521: Remove _PyObject_LookupAttr() function (GH-106642)	2023-07-12 08:57:10 +03:00
Inada Naoki	d5bd32fb48	gh-104922: remove PY_SSIZE_T_CLEAN (#106315 )	2023-07-02 15:07:46 +09:00
Victor Stinner	8ed705c083	gh-105156: Deprecate the old Py_UNICODE type in C API (#105157 ) Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead. Replace Py_UNICODE with wchar_t in multiple C files. Co-authored-by: Inada Naoki <songofacandy@gmail.com>	2023-06-01 08:56:35 +02:00
Erlend E. Aasland	186bf39f5c	gh-101819: Isolate `_io` (#101948 ) Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com> Co-authored-by: Victor Stinner <vstinner@python.org>	2023-05-15 09:26:27 +00:00
Erlend E. Aasland	d0a738c6df	gh-101819: Refactor `_io` futher in preparation for module isolation (#104369 )	2023-05-11 15:56:30 +05:30
Victor Stinner	c84029179c	gh-101819: Prepare to modernize the _io extension (#104178 ) * Add references to static types to _PyIO_State: * PyBufferedIOBase_Type * PyBytesIOBuffer_Type * PyIncrementalNewlineDecoder_Type * PyRawIOBase_Type * PyTextIOBase_Type * Add the defining class to methods: * _io.BytesIO.getbuffer() * _io.FileIO.close() * Add get_io_state_by_cls() function. * Add state parameter to _textiowrapper_decode() * _io_TextIOWrapper___init__() now sets self->state before calling _textiowrapper_set_decoder(). Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>	2023-05-06 01:53:55 +02:00
Max Bachmann	c6858d1e7f	gh-102255: Improve build support for Windows API partitions (GH-102256) Add `MS_WINDOWS_DESKTOP`, `MS_WINDOWS_APPS`, `MS_WINDOWS_SYSTEM` and `MS_WINDOWS_GAMES` preprocessor definitions to allow switching off functionality missing from particular API partitions ("partitions" are used in Windows to identify overlapping subsets of APIs). CPython only officially supports `MS_WINDOWS_DESKTOP` and `MS_WINDOWS_SYSTEM` (APPS is included by normal desktop builds, but APPS without DESKTOP is not covered). Other configurations are a convenience for people building their own runtimes. `MS_WINDOWS_GAMES` is for the Xbox subset of the Windows API, which is also available on client OS, but is restricted compared to `MS_WINDOWS_DESKTOP`. These restrictions may change over time, as they relate to the build headers rather than the OS support, and so we assume that Xbox builds will use the latest available version of the GDK.	2023-03-09 21:09:12 +00:00
Irit Katriel	2db23d10bf	gh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (in Modules/) (#102196 )	2023-02-24 21:43:03 +00:00
Erlend E. Aasland	c00faf7943	gh-101819: Adapt _io types to heap types, batch 1 (GH-101949) Adapt StringIO, TextIOWrapper, FileIO, Buffered*, and BytesIO types. Automerge-Triggered-By: GH:erlend-aasland	2023-02-20 05:46:20 -08:00
Serhiy Storchaka	a87c46eab3	bpo-15999: Accept arbitrary values for boolean parameters. (#15609 ) builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.	2022-12-03 11:52:21 -08:00
Zackery Spytz	d386115039	bpo-38031: Fix a possible assertion failure in _io.FileIO() (#GH-5688)	2022-11-25 12:55:26 +00:00
Inada Naoki	f9c9354a7a	gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)	2022-05-12 14:48:38 +09:00
Eric Snow	81c72044a1	bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928) We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules. The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings). https://bugs.python.org/issue46541#msg411799 explains the rationale for this change. The core of the change is in: * (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros * Include/internal/pycore_runtime_init.h - added the static initializers for the global strings * Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState * Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config. The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _PyId functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _PyId(), replacing the _Py_Identifier * parameter with PyObject . The following are not changed (yet): stop using _Py_IDENTIFIER() in the stdlib modules * (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API * (maybe) intern the strings during runtime init https://bugs.python.org/issue46541	2022-02-08 13:39:07 -07:00
Victor Stinner	97308dfcdc	bpo-45434: Move _Py_BEGIN_SUPPRESS_IPH to pycore_fileutils.h (GH-28922)	2021-10-13 15:03:35 +02:00
Serhiy Storchaka	4c8f09d7ce	bpo-36346: Make using the legacy Unicode C API optional (GH-21437) Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0 makes the interpreter not using the wchar_t cache and the legacy Unicode C API.	2020-07-10 23:26:06 +03:00
Serhiy Storchaka	578c3955e0	bpo-37999: No longer use __int__ in implicit integer conversions. (GH-15636) Only __index__ should be used to make integer conversions lossless.	2020-05-26 18:43:38 +03:00
Victor Stinner	4a21e57fe5	bpo-40268: Remove unused structmember.h includes (GH-19530) If only offsetof() is needed: include stddef.h instead. When structmember.h is used, add a comment explaining that PyMemberDef is used.	2020-04-15 02:35:41 +02:00
Benjamin Peterson	74fa9f723f	closes bpo-27805: Ignore ESPIPE in initializing seek of append-mode files. (GH-17112) This change, which follows the behavior of C stdio's fdopen and Python 2's file object, allows pipes to be opened in append mode.	2019-11-12 14:51:34 -08:00
Serhiy Storchaka	279f44678c	bpo-37206: Unrepresentable default values no longer represented as None. (GH-13933) In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values (like in the optional third parameter of getattr). "None" should be used if None is accepted as argument and passing None has the same effect as not passing the argument at all.	2019-09-14 12:24:05 +03:00
Jeroen Demeyer	59ad110d7a	bpo-37547: add _PyObject_CallMethodOneArg (GH-14685)	2019-07-11 17:59:05 +09:00
Jeroen Demeyer	530f506ac9	bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async (GH-13464) Automatically replace tp_print -> tp_vectorcall_offset tp_compare -> tp_as_async tp_reserved -> tp_as_async	2019-05-30 19:13:39 -07:00
Antoine Pitrou	ada319bb6d	bpo-32388: Remove cross-version binary compatibility requirement in tp_flags (GH-4944) It is now allowed to add new fields at the end of the PyTypeObject struct without having to allocate a dedicated compatibility flag in tp_flags. This will reduce the risk of running out of bits in the 32-bit tp_flags value.	2019-05-29 22:12:38 +02:00
Steve Dower	b82e17e626	bpo-36842: Implement PEP 578 (GH-12613) Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.	2019-05-23 08:45:22 -07:00
Victor Stinner	bcda8f1d42	bpo-35081: Add Include/internal/pycore_object.h (GH-10640) Move _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() from Include/objimpl.h to Include/internal/pycore_object.h.	2018-11-21 22:27:47 +01:00
Serhiy Storchaka	0353b4eaaf	bpo-33138: Change standard error message for non-pickleable and non-copyable types. (GH-6239)	2018-10-31 02:28:07 +02:00
Stéphane Wirtel	74a8b6ea7e	bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) On macOS, fix reading from and writing into a file with a size larger than 2 GiB.	2018-10-18 01:05:04 +02:00
Siddhesh Poyarekar	55edd0c185	bpo-33012: Fix invalid function cast warnings with gcc 8 for METH_NOARGS. (GH-6030) METH_NOARGS functions need only a single argument but they are cast into a PyCFunction, which takes two arguments. This triggers an invalid function cast warning in gcc8 due to the argument mismatch. Fix this by adding a dummy unused argument.	2018-04-29 21:59:33 +03:00
Serhiy Storchaka	f320be77ff	bpo-32571: Avoid raising unneeded AttributeError and silencing it in C code (GH-5222) Add two new private APIs: _PyObject_LookupAttr() and _PyObject_LookupAttrId()	2018-01-25 17:49:40 +09:00
Nir Soffer	6a89481680	bpo-32186: Release the GIL during lseek and fstat (#4652 ) In _io_FileIO_readall_impl(), lseek() and _Py_fstat_noraise() were called without releasing the GIL. This can cause all threads to hang for unlimited time when calling FileIO.read() and the NFS server is not accessible.	2017-12-01 02:18:58 +01:00
Victor Stinner	8c663fd60e	Replace KB unit with KiB (#4293 ) kB (kilo byte) unit means 1000 bytes, whereas KiB ("kibibyte") means 1024 bytes. KB was misused: replace kB or KB with KiB when appropriate. Same change for MB and GB which become MiB and GiB. Change the output of Tools/iobench/iobench.py. Round also the size of the documentation from 5.5 MB to 5 MiB.	2017-11-08 14:44:44 -08:00
Serhiy Storchaka	f7eae0adfc	[security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302 ) Based on patch by Victor Stinner. Add private C API function _PyUnicode_AsUnicode() which is similar to PyUnicode_AsUnicode(), but checks for null characters.	2017-06-28 08:30:06 +03:00
Victor Stinner	9997073736	bpo-30228: FileIO seek() and tell() set seekable (#1384 ) FileIO.seek() and FileIO.tell() method now set the internal seekable attribute to avoid one syscall on open() (in buffered or text mode). The seekable property is now also more reliable since its value is set correctly on memory allocation failure.	2017-05-02 15:10:39 +02:00
Serhiy Storchaka	55fe1ae970	bpo-30022: Get rid of using EnvironmentError and IOError (except test… (#1051 )	2017-04-16 10:46:38 +03:00
Serhiy Storchaka	762bf40438	bpo-29852: Argument Clinic Py_ssize_t converter now supports None (#716 ) if pass `accept={int, NoneType}`.	2017-03-30 09:15:31 +03:00
Serhiy Storchaka	a5af6e1af7	bpo-25455: Fixed crashes in repr of recursive buffered file-like objects. (#514 )	2017-03-19 19:25:29 +02:00

1 2 3 4

179 Commits