Commit Graph

177 Commits

Author SHA1 Message Date
Victor Stinner 43cd7aa8cd
gh-120754: Fix memory leak in FileIO.__init__() (#124225)
Free 'self->stat_atopen' before assigning it, since
io.FileIO.__init__() can be called multiple times manually
(especially by test_io).
2024-09-19 00:11:50 +02:00
Cody Maloney 8b6c7c7877
gh-120754: Refactor I/O modules to stash whole stat result rather than individual members (#123412)
Multiple places in the I/O stack optimize common cases by using the
information from stat. Currently individual members are extracted from
the stat and stored into the fileio struct. Refactor the code to store
the whole stat struct instead.

Parallels the changes to _io. The `stat` Python object doesn't allow
changing members, so rather than modifying estimated_size, just clear
the value.
2024-09-18 17:47:57 +02:00
Cody Maloney 06a1c3fb24
gh-120754: Update estimated_size in C truncate (#121357)
Sometimes a large file is truncated (test_largefile). While
estimated_size is used as a estimate (the read will stil get the number
of bytes in the file), that it is much larger than the actual size of
data can result in a significant over allocation and sometimes lead to
a MemoryError / running out of memory.

This brings the C implementation to match the Python _pyio
implementation.
2024-07-04 12:59:18 +00:00
Cody Maloney 2f5f19e783
gh-120754: Reduce system calls in full-file FileIO.readall() case (#120755)
This reduces the system call count of a simple program[0] that reads all
the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my
linux system, 5813 -> 4875 on my macOS)

This reduces the number of `fstat()` calls always and seek calls most
the time. Stat was always called twice, once at open (to error early on
directories), and a second time to get the size of the file to be able
to read the whole file in one read. Now the size is cached with the
first call.

The code keeps an optimization that if the user had previously read a
lot of data, the current position is subtracted from the number of bytes
to read. That is somewhat expensive so only do it on larger files,
otherwise just try and read the extra bytes and resize the PyBytes as
needeed.

I built a little test program to validate the behavior + assumptions
around relative costs and then ran it under `strace` to get a log of the
system calls. Full samples below[1].

After the changes, this is everything in one `filename.read_text()`:

```python3
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3`
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0`
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

This does make some tradeoffs
1. If the file size changes between open() and readall(), this will
still get all the data but might have more read calls.
2. I experimented with avoiding the stat + cached result for small files
in general, but on my dev workstation at least that tended to reduce
performance compared to using the fstat().

[0]

```python3
from pathlib import Path

nlines = []
for filename in Path("cpython/Doc").glob("**/*.rst"):
    nlines.append(len(filename.read_text()))
```

[1]
Before small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

After small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

Before large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

After large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
2024-07-04 09:17:00 +02:00
Serhiy Storchaka 35f60c3def
gh-117764: Add signatures for __reduce__ and __reduce_ex__ in the _io module (GH-117773)
__reduce__() does not have parameters, __reduce_ex__() has a single
parameter.
2024-04-12 12:22:17 +03:00
Serhiy Storchaka 652fbf88c4
gh-82626: Emit a warning when bool is used as a file descriptor (GH-111275) 2024-02-05 22:51:11 +02:00
Nikita Sobolev 05e47202a3
gh-114286: Fix `maybe-uninitialized` warning in `Modules/_io/fileio.c` (GH-114287) 2024-01-19 10:25:05 +00:00
AN Long 623b338adf
gh-66060: Use actual class name in _io type's __repr__ (#30824)
Use the object's actual class name in the following _io type's __repr__:
- FileIO
- TextIOWrapper
- _WindowsConsoleIO
2024-01-09 21:39:36 +01:00
Victor Stinner 7513994c92
gh-110014: Include explicitly <unistd.h> header (#110155)
* Remove unused <locale.h> includes.
* Remove unused <fcntl.h> include in traceback.h.
* Remove redundant <assert.h> and <stddef.h> includes. They  are already
  included by "Python.h".
* Remove <object.h> include in faulthandler.c. Python.h already includes it.
* Add missing <stdbool.h> in pycore_pythread.h if HAVE_PTHREAD_STUBS
  is defined.
* Fix also warnings in pthread_stubs.h: don't redefine macros if they
  are already defined, like the __NEED_pthread_t macro.
2023-09-30 20:06:45 +00:00
Victor Stinner 79823c103b
gh-106320: Remove private _PyErr_ChainExceptions() (#108713)
Remove _PyErr_ChainExceptions(), _PyErr_ChainExceptions1() and
_PyErr_SetFromPyStatus() functions from the public C API.

* Move the private _PyErr_ChainExceptions() and
  _PyErr_ChainExceptions1() function to the internal C API
  (pycore_pyerrors.h).
* Move the private _PyErr_SetFromPyStatus() to the internal C API
  (pycore_initconfig.h).
* No longer export the _PyErr_ChainExceptions() function.
* Move run_in_subinterp_with_config() from _testcapi to
  _testinternalcapi.
2023-08-31 13:53:19 +02:00
Serhiy Storchaka 2b15536fa9
gh-107913: Fix possible losses of OSError error codes (GH-107930)
Functions like PyErr_SetFromErrno() and SetFromWindowsErr() should be
called immediately after using the C API which sets errno or the Windows
error code.
2023-08-27 00:35:06 +03:00
Victor Stinner b32d4cad15
gh-108444: Replace _PyLong_AsInt() with PyLong_AsInt() (#108459)
Change generated by the command:

sed -i -e 's!_PyLong_AsInt!PyLong_AsInt!g' \
    $(find -name "*.c" -o -name "*.h")
2023-08-25 01:01:30 +02:00
Victor Stinner 1a3faba9f1
gh-106869: Use new PyMemberDef constant names (#106871)
* Remove '#include "structmember.h"'.
* If needed, add <stddef.h> to get offsetof() function.
* Update Parser/asdl_c.py to regenerate Python/Python-ast.c.
* Replace:

  * T_SHORT => Py_T_SHORT
  * T_INT => Py_T_INT
  * T_LONG => Py_T_LONG
  * T_FLOAT => Py_T_FLOAT
  * T_DOUBLE => Py_T_DOUBLE
  * T_STRING => Py_T_STRING
  * T_OBJECT => _Py_T_OBJECT
  * T_CHAR => Py_T_CHAR
  * T_BYTE => Py_T_BYTE
  * T_UBYTE => Py_T_UBYTE
  * T_USHORT => Py_T_USHORT
  * T_UINT => Py_T_UINT
  * T_ULONG => Py_T_ULONG
  * T_STRING_INPLACE => Py_T_STRING_INPLACE
  * T_BOOL => Py_T_BOOL
  * T_OBJECT_EX => Py_T_OBJECT_EX
  * T_LONGLONG => Py_T_LONGLONG
  * T_ULONGLONG => Py_T_ULONGLONG
  * T_PYSSIZET => Py_T_PYSSIZET
  * T_NONE => _Py_T_NONE
  * READONLY => Py_READONLY
  * PY_AUDIT_READ => Py_AUDIT_READ
  * READ_RESTRICTED => Py_AUDIT_READ
  * PY_WRITE_RESTRICTED => _Py_WRITE_RESTRICTED
  * RESTRICTED => (READ_RESTRICTED | _Py_WRITE_RESTRICTED)
2023-07-25 15:28:30 +02:00
Serhiy Storchaka be1b968dc1
gh-106521: Remove _PyObject_LookupAttr() function (GH-106642) 2023-07-12 08:57:10 +03:00
Inada Naoki d5bd32fb48
gh-104922: remove PY_SSIZE_T_CLEAN (#106315) 2023-07-02 15:07:46 +09:00
Victor Stinner 8ed705c083
gh-105156: Deprecate the old Py_UNICODE type in C API (#105157)
Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API:
use wchar_t instead.

Replace Py_UNICODE with wchar_t in multiple C files.

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2023-06-01 08:56:35 +02:00
Erlend E. Aasland 186bf39f5c
gh-101819: Isolate `_io` (#101948)
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
2023-05-15 09:26:27 +00:00
Erlend E. Aasland d0a738c6df
gh-101819: Refactor `_io` futher in preparation for module isolation (#104369) 2023-05-11 15:56:30 +05:30
Victor Stinner c84029179c
gh-101819: Prepare to modernize the _io extension (#104178)
* Add references to static types to _PyIO_State:

  * PyBufferedIOBase_Type
  * PyBytesIOBuffer_Type
  * PyIncrementalNewlineDecoder_Type
  * PyRawIOBase_Type
  * PyTextIOBase_Type

* Add the defining class to methods:

  * _io.BytesIO.getbuffer()
  * _io.FileIO.close()

* Add get_io_state_by_cls() function.
* Add state parameter to _textiowrapper_decode()
* _io_TextIOWrapper___init__() now sets self->state before calling
  _textiowrapper_set_decoder().

Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
2023-05-06 01:53:55 +02:00
Max Bachmann c6858d1e7f
gh-102255: Improve build support for Windows API partitions (GH-102256)
Add `MS_WINDOWS_DESKTOP`, `MS_WINDOWS_APPS`, `MS_WINDOWS_SYSTEM` and `MS_WINDOWS_GAMES` preprocessor definitions to allow switching off functionality missing from particular API partitions ("partitions" are used in Windows to identify overlapping subsets of APIs).
CPython only officially supports `MS_WINDOWS_DESKTOP` and `MS_WINDOWS_SYSTEM` (APPS is included by normal desktop builds, but APPS without DESKTOP is not covered). Other configurations are a convenience for people building their own runtimes.
`MS_WINDOWS_GAMES` is for the Xbox subset of the Windows API, which is also available on client OS, but is restricted compared to `MS_WINDOWS_DESKTOP`. These restrictions may change over time, as they relate to the build headers rather than the OS support, and so we assume that Xbox builds will use the latest available version of the GDK.
2023-03-09 21:09:12 +00:00
Irit Katriel 2db23d10bf
gh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (in Modules/) (#102196) 2023-02-24 21:43:03 +00:00
Erlend E. Aasland c00faf7943
gh-101819: Adapt _io types to heap types, batch 1 (GH-101949)
Adapt StringIO, TextIOWrapper, FileIO, Buffered*, and BytesIO types.

Automerge-Triggered-By: GH:erlend-aasland
2023-02-20 05:46:20 -08:00
Serhiy Storchaka a87c46eab3
bpo-15999: Accept arbitrary values for boolean parameters. (#15609)
builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
2022-12-03 11:52:21 -08:00
Zackery Spytz d386115039
bpo-38031: Fix a possible assertion failure in _io.FileIO() (#GH-5688) 2022-11-25 12:55:26 +00:00
Inada Naoki f9c9354a7a
gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537) 2022-05-12 14:48:38 +09:00
Eric Snow 81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Victor Stinner 97308dfcdc
bpo-45434: Move _Py_BEGIN_SUPPRESS_IPH to pycore_fileutils.h (GH-28922) 2021-10-13 15:03:35 +02:00
Serhiy Storchaka 4c8f09d7ce
bpo-36346: Make using the legacy Unicode C API optional (GH-21437)
Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0
makes the interpreter not using the wchar_t cache and the legacy Unicode C API.
2020-07-10 23:26:06 +03:00
Serhiy Storchaka 578c3955e0
bpo-37999: No longer use __int__ in implicit integer conversions. (GH-15636)
Only __index__ should be used to make integer conversions lossless.
2020-05-26 18:43:38 +03:00
Victor Stinner 4a21e57fe5
bpo-40268: Remove unused structmember.h includes (GH-19530)
If only offsetof() is needed: include stddef.h instead.

When structmember.h is used, add a comment explaining that
PyMemberDef is used.
2020-04-15 02:35:41 +02:00
Benjamin Peterson 74fa9f723f
closes bpo-27805: Ignore ESPIPE in initializing seek of append-mode files. (GH-17112)
This change, which follows the behavior of C stdio's fdopen and Python 2's file object, allows pipes to be opened in append mode.
2019-11-12 14:51:34 -08:00
Serhiy Storchaka 279f44678c
bpo-37206: Unrepresentable default values no longer represented as None. (GH-13933)
In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values
(like in the optional third parameter of getattr). "None" should be used if None is accepted
as argument and passing None has the same effect as not passing the argument at all.
2019-09-14 12:24:05 +03:00
Jeroen Demeyer 59ad110d7a bpo-37547: add _PyObject_CallMethodOneArg (GH-14685) 2019-07-11 17:59:05 +09:00
Jeroen Demeyer 530f506ac9 bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async (GH-13464)
Automatically replace
tp_print -> tp_vectorcall_offset
tp_compare -> tp_as_async
tp_reserved -> tp_as_async
2019-05-30 19:13:39 -07:00
Antoine Pitrou ada319bb6d
bpo-32388: Remove cross-version binary compatibility requirement in tp_flags (GH-4944)
It is now allowed to add new fields at the end of the PyTypeObject struct without having to allocate a dedicated compatibility flag in tp_flags.

This will reduce the risk of running out of bits in the 32-bit tp_flags value.
2019-05-29 22:12:38 +02:00
Steve Dower b82e17e626
bpo-36842: Implement PEP 578 (GH-12613)
Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
2019-05-23 08:45:22 -07:00
Victor Stinner bcda8f1d42
bpo-35081: Add Include/internal/pycore_object.h (GH-10640)
Move _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() from
Include/objimpl.h to Include/internal/pycore_object.h.
2018-11-21 22:27:47 +01:00
Serhiy Storchaka 0353b4eaaf
bpo-33138: Change standard error message for non-pickleable and non-copyable types. (GH-6239) 2018-10-31 02:28:07 +02:00
Stéphane Wirtel 74a8b6ea7e bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
On macOS, fix reading from and writing into a file with a size larger than 2 GiB.
2018-10-18 01:05:04 +02:00
Siddhesh Poyarekar 55edd0c185 bpo-33012: Fix invalid function cast warnings with gcc 8 for METH_NOARGS. (GH-6030)
METH_NOARGS functions need only a single argument but they are cast
into a PyCFunction, which takes two arguments.  This triggers an
invalid function cast warning in gcc8 due to the argument mismatch.
Fix this by adding a dummy unused argument.
2018-04-29 21:59:33 +03:00
Serhiy Storchaka f320be77ff bpo-32571: Avoid raising unneeded AttributeError and silencing it in C code (GH-5222)
Add two new private APIs: _PyObject_LookupAttr() and _PyObject_LookupAttrId()
2018-01-25 17:49:40 +09:00
Nir Soffer 6a89481680 bpo-32186: Release the GIL during lseek and fstat (#4652)
In _io_FileIO_readall_impl(), lseek() and _Py_fstat_noraise() were called
without releasing the GIL. This can cause all threads to hang for
unlimited time when calling FileIO.read() and the NFS server is not
accessible.
2017-12-01 02:18:58 +01:00
Victor Stinner 8c663fd60e
Replace KB unit with KiB (#4293)
kB (*kilo* byte) unit means 1000 bytes, whereas KiB ("kibibyte")
means 1024 bytes. KB was misused: replace kB or KB with KiB when
appropriate.

Same change for MB and GB which become MiB and GiB.

Change the output of Tools/iobench/iobench.py.

Round also the size of the documentation from 5.5 MB to 5 MiB.
2017-11-08 14:44:44 -08:00
Serhiy Storchaka f7eae0adfc [security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302)
Based on patch by Victor Stinner.

Add private C API function _PyUnicode_AsUnicode() which is similar to
PyUnicode_AsUnicode(), but checks for null characters.
2017-06-28 08:30:06 +03:00
Victor Stinner 9997073736 bpo-30228: FileIO seek() and tell() set seekable (#1384)
FileIO.seek() and FileIO.tell() method now set the internal seekable
attribute to avoid one syscall on open() (in buffered or text mode).

The seekable property is now also more reliable since its value is
set correctly on memory allocation failure.
2017-05-02 15:10:39 +02:00
Serhiy Storchaka 55fe1ae970 bpo-30022: Get rid of using EnvironmentError and IOError (except test… (#1051) 2017-04-16 10:46:38 +03:00
Serhiy Storchaka 762bf40438 bpo-29852: Argument Clinic Py_ssize_t converter now supports None (#716)
if pass `accept={int, NoneType}`.
2017-03-30 09:15:31 +03:00
Serhiy Storchaka a5af6e1af7 bpo-25455: Fixed crashes in repr of recursive buffered file-like objects. (#514) 2017-03-19 19:25:29 +02:00
Serhiy Storchaka 202fda55c2 bpo-24037: Add Argument Clinic converter `bool(accept={int})`. (#485) 2017-03-12 10:10:47 +02:00
Steve Dower 43fec9b419 Merge issue #28164 and issue #29409 2017-02-04 15:14:18 -08:00