Commit Graph

180 Commits

Author SHA1 Message Date
Cody Maloney 8b6c7c7877
gh-120754: Refactor I/O modules to stash whole stat result rather than individual members (#123412)
Multiple places in the I/O stack optimize common cases by using the
information from stat. Currently individual members are extracted from
the stat and stored into the fileio struct. Refactor the code to store
the whole stat struct instead.

Parallels the changes to _io. The `stat` Python object doesn't allow
changing members, so rather than modifying estimated_size, just clear
the value.
2024-09-18 17:47:57 +02:00
Cody Maloney 2f5f19e783
gh-120754: Reduce system calls in full-file FileIO.readall() case (#120755)
This reduces the system call count of a simple program[0] that reads all
the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my
linux system, 5813 -> 4875 on my macOS)

This reduces the number of `fstat()` calls always and seek calls most
the time. Stat was always called twice, once at open (to error early on
directories), and a second time to get the size of the file to be able
to read the whole file in one read. Now the size is cached with the
first call.

The code keeps an optimization that if the user had previously read a
lot of data, the current position is subtracted from the number of bytes
to read. That is somewhat expensive so only do it on larger files,
otherwise just try and read the extra bytes and resize the PyBytes as
needeed.

I built a little test program to validate the behavior + assumptions
around relative costs and then ran it under `strace` to get a log of the
system calls. Full samples below[1].

After the changes, this is everything in one `filename.read_text()`:

```python3
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3`
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0`
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

This does make some tradeoffs
1. If the file size changes between open() and readall(), this will
still get all the data but might have more read calls.
2. I experimented with avoiding the stat + cached result for small files
in general, but on my dev workstation at least that tended to reduce
performance compared to using the fstat().

[0]

```python3
from pathlib import Path

nlines = []
for filename in Path("cpython/Doc").glob("**/*.rst"):
    nlines.append(len(filename.read_text()))
```

[1]
Before small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

After small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

Before large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

After large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
2024-07-04 09:17:00 +02:00
Victor Stinner 6ae254aaa0
gh-120417: Add #noqa to used imports in the stdlib (#120421)
Tools such as ruff can ignore "imported but unused" warnings if a
line ends with "# noqa: F401". It avoids the temptation to remove
an import which is used effectively.
2024-06-13 16:14:50 +02:00
6t8k 26800cf25a
gh-95782: Fix io.BufferedReader.tell() etc. being able to return offsets < 0 (GH-99709)
lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
2024-02-17 11:16:06 +00:00
Serhiy Storchaka 652fbf88c4
gh-82626: Emit a warning when bool is used as a file descriptor (GH-111275) 2024-02-05 22:51:11 +02:00
Zackery Spytz 73c9326563
gh-80109: Fix io.TextIOWrapper dropping the internal buffer during write() (GH-22535)
io.TextIOWrapper was dropping the internal decoding buffer
during read() and write() calls.
2024-01-08 12:33:34 +02:00
Victor Stinner 58a2e09816
gh-62948: IOBase finalizer logs close() errors (#105104) 2023-05-31 11:41:19 +00:00
Nick Drozd 024ac542d7
bpo-45975: Simplify some while-loops with walrus operator (GH-29347) 2022-11-26 14:33:25 -08:00
Nikita Sobolev 2cfcaf5af6
gh-98999: Raise `ValueError` in `_pyio` on closed buffers (gh-99009) 2022-11-03 12:03:12 +09:00
Victor Stinner 6e33ba114f
gh-94169: Remove deprecated io.OpenWrapper (#94170)
Remove io.OpenWrapper and _pyio.OpenWrapper, deprecated in Python
3.10: just use :func:`open` instead. The open() (io.open()) function
is a built-in function. Since Python 3.10, _pyio.open() is also a
static method.
2022-06-24 08:46:53 +02:00
Dong-hee Na f7fabae75c
gh-93099: Fix _pyio to use locale module properly (gh-93136) 2022-05-24 09:37:01 +09:00
Inada Naoki 0729b31a8b
gh-91952: Make TextIOWrapper.reconfigure() supports "locale" encoding (GH-91982) 2022-05-01 10:44:14 +09:00
Inada Naoki 6fdb62b1fa
gh-91526: io: Remove device encoding support from TextIOWrapper (GH-91529)
`TextIOWrapper.__init__()` called `os.device_encoding(file.fileno())` if fileno is 0-2 and encoding=None.
But it is very rarely works, and never documented behavior.
2022-04-19 11:44:36 +09:00
Inada Naoki 13b17e2a0a
gh-91156: Fix `encoding="locale"` in UTF-8 mode (GH-70056) 2022-04-14 16:00:35 +09:00
Inada Naoki 4216dce04b
bpo-47000: Make `io.text_encoding()` respects UTF-8 mode (GH-32003)
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2022-04-04 11:46:57 +09:00
slateny cedd2473a9
bpo-25415: Remove confusing sentence from IOBase docstrings (PR-31631) 2022-03-04 12:35:52 -05:00
Thomas Grainger 9b12b1b803
bpo-46522: fix concurrent.futures and io AttributeError messages (GH-30887)
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com>
2022-02-23 02:25:00 +02:00
Victor Stinner 19ba2122ac
bpo-37330: open() no longer accept 'U' in file mode (GH-28118)
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
2021-09-02 12:58:00 +02:00
Victor Stinner 3bc694d5f3
bpo-43680: Deprecate io.OpenWrapper (GH-25357)
Deprecate io.OpenWrapper and _pyio.OpenWrapper: use io.open and
_pyio.open instead. Until Python 3.9, _pyio.open was not a static
method and builtins.open was set to OpenWrapper to not become a bound
method when set to a class variable. _io.open is a built-in function
whereas _pyio.open is a Python function. In Python 3.10, _pyio.open()
is now a static method, and builtins.open() is now io.open().
2021-04-14 03:24:33 +02:00
Victor Stinner 77d668b122
bpo-43680: _pyio.open() becomes a static method (GH-25354)
The Python _pyio.open() function becomes a static method to behave as
io.open() built-in function: don't become a bound method when stored
as a class variable. It becomes possible since static methods are now
callable in Python 3.10. Moreover, _pyio.OpenWrapper becomes a simple
alias to _pyio.open.

init_set_builtins_open() now sets builtins.open to io.open, rather
than setting it to io.OpenWrapper, since OpenWrapper is now an alias
to open in the io and _pyio modules.
2021-04-12 10:44:53 +02:00
Inada Naoki cfa176685a
Revert "bpo-43510: PEP 597: Accept `encoding="locale"` in binary mode (GH-25103)" (#25108)
This reverts commit ff3c9739bd.
2021-03-31 18:49:41 +09:00
Inada Naoki ff3c9739bd
bpo-43510: PEP 597: Accept `encoding="locale"` in binary mode (GH-25103)
It make `encoding="locale"` usable everywhere `encoding=None` is
allowed.
2021-03-31 14:26:08 +09:00
Inada Naoki 4827483f47
bpo-43510: Implement PEP 597 opt-in EncodingWarning. (GH-19481)
See [PEP 597](https://www.python.org/dev/peps/pep-0597/).

* Add `-X warn_default_encoding` and `PYTHONWARNDEFAULTENCODING`.
* Add EncodingWarning
* Add io.text_encoding()
* open(), TextIOWrapper() emits EncodingWarning when encoding is omitted and warn_default_encoding is enabled.
* _pyio.TextIOWrapper() uses UTF-8 as fallback default encoding used when failed to import locale module. (used during building Python)
* bz2, configparser, gzip, lzma, pathlib, tempfile modules use io.text_encoding().
* What's new entry
2021-03-29 12:28:14 +09:00
Victor Stinner 942f7a2dea
bpo-39674: Revert "bpo-37330: open() no longer accept 'U' in file mode (GH-16959)" (GH-18767)
This reverts commit e471e72977.

The mode will be removed from Python 3.10.
2020-03-04 18:50:22 +01:00
Berker Peksag fd5116c0e7
bpo-35950: Raise UnsupportedOperation in BufferedReader.truncate() (GH-18586)
The truncate() method of io.BufferedReader() should raise
UnsupportedOperation when it is called on a read-only
io.BufferedReader() instance.





https://bugs.python.org/issue35950



Automerge-Triggered-By: @methane
2020-02-21 09:57:26 -08:00
Benjamin Peterson 74fa9f723f
closes bpo-27805: Ignore ESPIPE in initializing seek of append-mode files. (GH-17112)
This change, which follows the behavior of C stdio's fdopen and Python 2's file object, allows pipes to be opened in append mode.
2019-11-12 14:51:34 -08:00
Victor Stinner e471e72977
bpo-37330: open() no longer accept 'U' in file mode (GH-16959)
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
2019-10-28 15:40:08 +01:00
Serhiy Storchaka 1f21eaa15e
bpo-15999: Clean up of handling boolean arguments. (GH-15610)
* Use the 'p' format unit instead of manually called PyObject_IsTrue().
* Pass boolean value instead 0/1 integers to functions that needs boolean.
* Convert some arguments to boolean only once.
2019-09-01 12:16:51 +03:00
Raymond Hettinger 0dac68f1e5
bpo-36743: __get__ is sometimes called without the owner argument (#12992) 2019-08-29 01:27:42 -07:00
Serhiy Storchaka b235a1b473
bpo-37960: Silence only necessary errors in repr() of buffered and text streams. (GH-15543) 2019-08-29 09:25:22 +03:00
Min ho Kim c4cacc8c5e Fix typos in comments, docs and test names (#15018)
* Fix typos in comments, docs and test names

* Update test_pyparse.py

account for change in string length

* Apply suggestion: splitable -> splittable

Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>

* Apply suggestion: splitable -> splittable

Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>

* Apply suggestion: Dealloccte -> Deallocate

Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>

* Update posixmodule checksum.

* Reverse idlelib changes.
2019-07-30 18:16:13 -04:00
Victor Stinner 22eb689cf3
bpo-37388: Development mode check encoding and errors (GH-14341)
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().

By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.
2019-06-26 00:51:05 +02:00
Victor Stinner 4f6f7c5a61
bpo-18748: Fix _pyio.IOBase destructor (closed case) (GH-13952)
_pyio.IOBase destructor now does nothing if getting the closed
attribute fails to better mimick _io.IOBase finalizer.
2019-06-11 02:49:06 +02:00
Victor Stinner a3568417c4
bpo-37054, _pyio: Fix BytesIO and TextIOWrapper __del__() (GH-13601)
Fix destructor _pyio.BytesIO and _pyio.TextIOWrapper: initialize
their _buffer attribute as soon as possible (in the class body),
because it's used by __del__() which calls close().
2019-05-28 01:44:21 +02:00
Steve Dower b82e17e626
bpo-36842: Implement PEP 578 (GH-12613)
Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
2019-05-23 08:45:22 -07:00
Victor Stinner bc2aa81662
bpo-18748: _pyio.IOBase emits unraisable exception (GH-13512)
In development (-X dev) mode and in a debug build, IOBase finalizer
of the _pyio module now logs the exception if the close() method
fails. The exception is ignored silently by default in release build.

test_io: test_error_through_destructor() now uses
support.catch_unraisable_exception() rather than capturing stderr.
2019-05-23 03:45:09 +02:00
Marcin Niemira ab86521a9d bpo-36523: Add docstring to io.IOBase.writelines (GH-12683) 2019-04-22 20:13:51 +09:00
Steve Palmer 7b97ab35b2 closes bpo-35848: Move all documentation regarding the readinto out of IOBase. (GH-11893)
Move all documentation regarding the readinto method into either io.RawIOBase or io.BufferedIOBase.

Corresponding changes to documentation in the _pyio.py module.
2019-04-08 21:35:27 -07:00
ngie-eign 848037c147 Use names SEEK_SET, etc instead of magic number (GH-12057)
The previous code hardcoded `SEEK_SET`, etc. While it's very unlikely
that these values will change, it's best to use the definitions to avoid
there being mismatches in behavior with the code in the future.

Signed-off-by: Enji Cooper <yaneurabeya@gmail.com>
2019-03-03 16:28:26 +09:00
Serhiy Storchaka 0353b4eaaf
bpo-33138: Change standard error message for non-pickleable and non-copyable types. (GH-6239) 2018-10-31 02:28:07 +02:00
Alexey Izbyshev a2670565d8 bpo-32236: open() emits RuntimeWarning if buffering=1 for binary mode (GH-4842)
If buffering=1 is specified for open() in binary mode, it is silently
treated as buffering=-1 (i.e., the default buffer size).
Coupled with the fact that line buffering is always supported in Python 2,
such behavior caused several issues (e.g., bpo-10344, bpo-21332).

Warn that line buffering is not supported if open() is called with
binary mode and buffering=1.
2018-10-20 02:22:31 +02:00
Raymond Hettinger 1401018da1
Remove wording that could be deemed to be perjorative (GH-9287) 2018-09-13 21:17:40 -07:00
Zackery Spytz 23db935bcf bpo-25862: Fix assertion failures in io.TextIOWrapper.tell(). (GH-3918) 2018-06-29 13:14:58 +03:00
INADA Naoki 507434fd50
bpo-15216: io: TextIOWrapper.reconfigure() accepts encoding, errors and newline (GH-2343) 2017-12-21 09:59:53 +09:00
Antoine Pitrou 317def9fdb bpo-17852: Revert incorrect fix based on misunderstanding of _Py_PyAtExit() semantics (#4826) 2017-12-13 01:39:26 +01:00
benfogle 9703f092ab bpo-31976: Fix race condition when flushing a file is slow. (#4331) 2017-11-10 22:03:40 +01:00
Neil Schemenauer 0a1ff24acf bpo-17852: Maintain a list of BufferedWriter objects. Flush them on exit. (#3372)
* Maintain a list of BufferedWriter objects.  Flush them on exit.

In Python 3, the buffer and the underlying file object are separate
and so the order in which objects are finalized matters.  This is
unlike Python 2 where the file and buffer were a single object and
finalization was done for both at the same time.  In Python 3, if
the file is finalized and closed before the buffer then the data in
the buffer is lost.

This change adds a doubly linked list of open file buffers.  An atexit
hook ensures they are flushed before proceeding with interpreter
shutdown.  This is addition does not remove the need to properly close
files as there are other reasons why buffered data could get lost during
finalization.

Initial patch by Armin Rigo.

* Use weakref.WeakSet instead of WeakKeyDictionary.

* Simplify buffered double-linked list types.

* In _flush_all_writers(), suppress errors from flush().

* Remove NEWS entry, use blurb.

* Take more care when flushing file buffers from atexit.

The previous implementation was not careful enough to avoid
causing issues in multi-threaded cases.  Check for buf->ok
and buf->finalizing before actually doing the flush.  Also,
increase the refcnt to ensure the object does not disappear.
2017-09-22 10:17:30 -07:00
Antoine Pitrou a6a4dc816d bpo-31370: Remove support for threads-less builds (#3385)
* Remove Setup.config
* Always define WITH_THREAD for compatibility.
2017-09-07 18:56:24 +02:00
Neil Schemenauer db564238db Revert "bpo-17852: Maintain a list of BufferedWriter objects. Flush them on exit. (#1908)" (#3337)
This reverts commit e38d12ed34.
2017-09-04 22:13:17 -07:00
Neil Schemenauer e38d12ed34 bpo-17852: Maintain a list of BufferedWriter objects. Flush them on exit. (#1908)
* Maintain a list of BufferedWriter objects.  Flush them on exit.

In Python 3, the buffer and the underlying file object are separate
and so the order in which objects are finalized matters.  This is
unlike Python 2 where the file and buffer were a single object and
finalization was done for both at the same time.  In Python 3, if
the file is finalized and closed before the buffer then the data in
the buffer is lost.

This change adds a doubly linked list of open file buffers.  An atexit
hook ensures they are flushed before proceeding with interpreter
shutdown.  This is addition does not remove the need to properly close
files as there are other reasons why buffered data could get lost during
finalization.

Initial patch by Armin Rigo.

* Use weakref.WeakSet instead of WeakKeyDictionary.

* Simplify buffered double-linked list types.

* In _flush_all_writers(), suppress errors from flush().

* Remove NEWS entry, use blurb.
2017-09-04 20:18:38 -07:00