The C-code in fileobject.readinto(buffer) which parses
the arguments assumes that size_t is interchangeable
with int:
size_t ntodo, ndone, nnow;
if (f->f_fp == NULL)
return err_closed();
if (!PyArg_Parse(args, "w#", &ptr, &ntodo))
return NULL;
This causes a problem on Alpha / Tru64 / OSF1 v5.1
where size_t is a long and sizeof(long) != sizeof(int).
The patch I'm proposing declares ntodo as an int. An
alternative might be to redefine w# to expect size_t.
[We can't change w# because there are probably third party modules
relying on it. GvR]
The problem is that if fread() returns a short count, we attempt
another fread() the next time through the loop, and apparently glibc
clears or ignores the eof condition so the second fread() requires
another ^D to make it see the eof condition.
According to the man page (and the C std, I hope) fread() can only
return a short count on error or eof. I'm using that in the band-aid
solution to avoid calling fread() a second time after a short read.
Note that xreadlines() still has this problem: it calls
readlines(sizehint) until it gets a zero-length return. Since
xreadlines() is mostly used for reading real files, I won't worry
about this until we get a bug report.
lseek(fp, 0L, SEEK_CUR) can make a filedescriptor unusable.
This workaround is expected to last only a few weeks (until GUSI
is fixed), but without it test_email fails.
many types were subclassable but had a xxx_dealloc function that
called PyObject_DEL(self) directly instead of deferring to
self->ob_type->tp_free(self). It is permissible to set tp_free in the
type object directly to _PyObject_Del, for non-GC types, or to
_PyObject_GC_Del, for GC types. Still, PyObject_DEL was a tad faster,
so I'm fearing that our pystone rating is going down again. I'm not
sure if doing something like
void xxx_dealloc(PyObject *self)
{
if (PyXxxCheckExact(self))
PyObject_DEL(self);
else
self->ob_type->tp_free(self);
}
is any faster than always calling the else branch, so I haven't
attempted that -- however those types whose own dealloc is fancier
(int, float, unicode) do use this pattern.
no backwards compatibility to worry about, so I just pushed the
'closure' struct member to the back -- it's never used in the current
code base (I may eliminate it, but that's more work because the getter
and setter signatures would have to change.)
As examples, I added actual docstrings to the getset attributes of a
few types: file.closed, xxsubtype.spamdict.state.
compatibility, this required all places where an array of "struct
memberlist" structures was declared that is referenced from a type's
tp_members slot to change the type of the structure to PyMemberDef;
"struct memberlist" is now only used by old code that still calls
PyMember_Get/Set. The code in PyObject_GenericGetAttr/SetAttr now
calls the new APIs PyMember_GetOne/SetOne, which take a PyMemberDef
argument.
As examples, I added actual docstrings to the attributes of a few
types: file, complex, instance method, super, and xxsubtype.spamlist.
Also converted the symtable to new style getattr.
A surprising number of changes to split tp_new into tp_new and tp_init.
Turned out the older PyFile_FromFile() didn't initialize the memory it
allocated in all (error) cases, which caused new sanity asserts
elsewhere to fail left & right (and could have, e.g., caused file_dealloc
to try decrefing random addresses).
just by doing type(f) where f is any file object. This left a hole in
restricted execution mode that rexec.py can't plug by itself (although it
can plug part of it; the rest is plugged in fileobject.c now).
Subtlety on Windows: if we change test_largefile.py to use a file
> 4GB, it still fails. A debug session suggests this is because
fseek(fp, 0, 2) refuses to seek to the end of the file when the file
is > 4GB, because it uses the SetFilePointer() in 32-bit mode.
But it only fails when we seek relative to the end of the file,
because in the other seek modes only calls to fgetpos() and fsetpos()
are made, which use Get/SetFilePointer() in 64-bit mode. Solution:
#ifdef MS_WInDOWS, replace the call to fseek(fp, ...) with a call to
_lseeki64(fileno(fp), ...). Make sure to call fflush(fp) first.
(XXX Could also replace the entire branch with a call to _lseeki64().
Would that be more efficient? Certainly less generated code.)
(XXX This needs more testing. I can't actually test that it works for
files >4GB on my Win98 machine, because the filesystem here won't let
me create files >=4GB at all. Tim should test this on his Win2K
machine.)
Curious: the MS docs say stati64 etc are supported even on Win95, but
Win95 doesn't support a filesystem that allows partitions > 2 Gb.
test_largefile: This was opening its test file in text mode. I have no
idea how that worked under Win64, but it sure needs binary mode on Win98.
BTW, on Win98 test_largefile runs quickly (under a second).
I believe this works on Linux (tested both on a system with large file
support and one without it), and it may work on Solaris 2.7.
The changes are twofold:
(1) The configure script now boldly tries to set the two symbols that
are recommended (for Solaris and Linux), and then tries a test
script that does some simple seeking without writing.
(2) The _portable_{fseek,ftell} functions are a little more systematic
in how they try the different large file support options: first
try fseeko/ftello, but only if off_t is large; then try
fseek64/ftell64; then try hacking with fgetpos/fsetpos.
I'm keeping my fingers crossed. The meaning of the
HAVE_LARGEFILE_SUPPORT macro is not at all clear.
I'll see if I can get it to work on Windows as well.
Previously, f.read() and f.readlines() checked for
errors on their file object and possibly raised an
IOError, but f.readline() didn't. This patch makes
f.readline() behave like the others.
Note that I've added a call to clearerr() since the other calls to
ferror() include that too.
I have no way to test this code. :-)
This should be faster.
This means:
(1) "for line in file:" won't work if the xreadlines module can't be
imported.
(2) The body of "for line in file:" shouldn't use the file directly;
the effects (e.g. of file.readline(), file.seek() or even
file.tell()) would be undefined because of the buffering that goes
on in the xreadlines module.
sees it (test_iter.py is unchanged).
- Added a tp_iternext slot, which calls the iterator's next() method;
this is much faster for built-in iterators over built-in types
such as lists and dicts, speeding up pybench's ForLoop with about
25% compared to Python 2.1. (Now there's a good argument for
iterators. ;-)
- Renamed the built-in sequence iterator SeqIter, affecting the C API
functions for it. (This frees up the PyIter prefix for generic
iterator operations.)
- Added PyIter_Check(obj), which checks that obj's type has a
tp_iternext slot and that the proper feature flag is set.
- Added PyIter_Next(obj) which calls the tp_iternext slot. It has a
somewhat complex return condition due to the need for speed: when it
returns NULL, it may not have set an exception condition, meaning
the iterator is exhausted; when the exception StopIteration is set
(or a derived exception class), it means the same thing; any other
exception means some other error occurred.
- In _portable_ftell(), try fgetpos() before ftello() and ftell64().
I ran into a situation on a 64-bit capable Linux where the C
library's ftello() and ftell64() returned negative numbers despite
fpos_t and off_t both being 64-bit types; fgetpos() did the right
thing.
- Define a new typedef, Py_off_t, which is either fpos_t or off_t,
depending on which one is 64 bits. This removes the need for a lot
of #ifdefs later on. (XXX Should this be moved to pyport.h? That
file currently seems oblivious to large fille support, so for now
I'll leave it here where it's needed.)
simpler if we use fgetpos and fsetpos, rather than trying to mess with
platform-specific TELL64 alternatives.
Of course, this hasn't been tested on a 64-bit platform, so I may have
to withdraw this -- but I'm hopeful, and Trent Mick supports this
patch!
faster than the other. Should be faster for Mark Favas's 254-character
mail log lines, and *is* 3-4% quicker for my test case with much shorter
lines (but they're typical of *my* text files, and I'm tired of optimizing
for everyone else at my expense <wink> -- in fact, the only one who loses
here is Guido ...).
Tim discovered another "bug" in my get_line() code: while the comments
said that n<0 was invalid, it was in fact still called with n<0 (when
PyFile_GetLine() was called with n<0). In that case fortunately
executed the same code as for n==0.
Changed the comment to admit this fact, and changed Tim's MS speed
hack code to use 'n <= 0' as the criteria for the speed hack.
code duplication is to let us get away without a realloc whenever possible;
boosted the init buf size (the cutoff at which we *can* get away without
a realloc) from 100 to 200 so that more files can enjoy this boost; and
allowed other threads to run in all cases. The last two cost something,
but not significantly: in my fat test case, less than a 1% slowdown total.
Since my test case has a great many short lines, that's probably the worst
slowdown, too. While the logic barely changed, there were lots of edits.
This also gets rid of the reference to fp->_cnt, so the last platform
assumption being made here is that fgets doesn't overwrite bytes
capriciously (== beyond the terminating null byte it must write).
variant that never needs to "search from the right".
Also fixed unlikely memory leak in get_line, if string size overflows INTMAX.
Also new std test test_bufio to make sure .readline() works.
realized that this behavior is already present in PyFile_GetLine(),
which is the only place that needs it. A little refactoring of that
function made get_line_raw() redundant.
- The raw_input() functionality is moved to a separate function.
- Drop GNU getline() in favor of getc_unlocked(), which exists on more
platforms (and is even a tad faster on my system).
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.
This closes SourceForge patch #101659 and bug #115323.