Commit Graph

140 Commits

Author SHA1 Message Date
Benjamin Peterson 274a76323c properly handle the single null-byte file (closes #24022) 2016-09-18 23:41:11 -07:00
Serhiy Storchaka 5d7d26c403 Issue #25388: Fixed tokenizer hang when processing undecodable source code
with a null byte.
2015-11-14 15:14:29 +02:00
Benjamin Peterson 223546d55c add missing NULL checks to get_coding_spec (closes #24854) 2015-08-13 21:52:56 -07:00
Serhiy Storchaka 3eb554fc82 Issue #22221: Backported fixes from Python 3 (issue #18960).
* Now the source encoding declaration on the second line isn't effective if
  the first line contains anything except a comment.  This affects compile(),
  eval() and exec() too.

* IDLE now ignores the source encoding declaration on the second line if the
  first line contains anything except a comment.

* 2to3 and the findnocoding.py script now ignore the source encoding
  declaration on the second line if the first line contains anything except
  a comment.
2014-09-05 10:22:05 +03:00
Ned Deily 24b8209a4e Issue #21789: fix broken link (reported by Jan Varho) 2014-06-17 12:24:53 -07:00
Benjamin Peterson 93e51aac54 allow the keyword else immediately after (no space) an integer (closes #21642) 2014-06-07 12:36:39 -07:00
Benjamin Peterson 22d9ee7e17 complain if the codec doesn't return unicode 2013-12-28 10:33:58 -06:00
Serhiy Storchaka 729ad5cf56 Issue #18038: SyntaxError raised during compilation sources with illegal
encoding now always contains an encoding name.
2013-06-09 16:54:56 +03:00
Stefan Krah 3db4161011 Issue #9020: The Py_IS* macros from pyctype.h should generally only be
used with signed/unsigned char arguments. For integer arguments, EOF
has to be handled separately.
2010-06-24 09:33:05 +00:00
Antoine Pitrou c83ea137d7 Untabify C files. Will watch buildbots. 2010-05-09 14:46:46 +00:00
Benjamin Peterson 88623d76b4 use our own locale independent ctype macros
requires building pyctype.o into pgen
2010-04-03 23:03:35 +00:00
Benjamin Peterson 4ceeeb09d8 ensure that the locale does not affect the tokenization of identifiers 2010-04-03 22:48:51 +00:00
Victor Stinner 6664426d7c Issue #3137: Don't ignore errors at startup, especially a keyboard interrupt
(SIGINT). If an error occurs while importing the site module, the error is
printed and Python exits. Initialize the GIL before importing the site
module.
2010-03-10 22:30:19 +00:00
Victor Stinner d23d3930ff Issue #7820: The parser tokenizer restores all bytes in the right if the BOM
check fails.

Fix an assertion in pydebug mode.
2010-03-02 23:20:02 +00:00
Benjamin Peterson 42d63847c3 rewrite translate_newlines for clarity 2009-12-06 17:37:48 +00:00
Benjamin Peterson e36199b49d fix several compile() issues by translating newlines in the tokenizer 2009-11-12 23:39:44 +00:00
Benjamin Peterson e3383b8e8f spelling 2009-11-07 01:04:38 +00:00
Benjamin Peterson 9586cf8677 fix some coding style 2009-10-09 21:48:14 +00:00
Benjamin Peterson 08a0bbc846 don't mask encoding errors when decoding a string #6289 2009-06-16 00:29:31 +00:00
Andrew M. Kuchling 110a48cf60 #3367: revert rev. 65539: this change causes test_parser to fail 2008-08-05 02:05:23 +00:00
Andrew M. Kuchling efa61bc15f #3367 from Kristjan Valur Jonsson:
If a PyTokenizer_FromString() is called with an empty string, the
tokenizer's line_start member never gets initialized.  Later, it is
compared with the token pointer 'a' in parsetok.c:193 and that behavior
can result in undefined behavior.
2008-08-05 01:38:08 +00:00
Gregory P. Smith dd96db63f6 This reverts r63675 based on the discussion in this thread:
http://mail.python.org/pipermail/python-dev/2008-June/079988.html

Python 2.6 should stick with PyString_* in its codebase.  The PyBytes_* names
in the spirit of 3.0 are available via a #define only.  See the email thread.
2008-06-09 04:58:54 +00:00
Christian Heimes 593daf545b Renamed PyString to PyBytes 2008-05-26 12:51:38 +00:00
Amaury Forgeot d'Arc 5216721a53 Issue2681: the literal 0o8 was wrongly accepted, and evaluated as float(0.0).
This happened only when 8 is the first digit.
Credits go to Lukas Meuser.
2008-04-24 18:07:05 +00:00
Neal Norwitz d183bdd6fb Revert r61969 which added casts to Py_CHARMASK to avoid compiler warnings.
Rather than sprinkle casts throughout the code, change Py_CHARMASK to
always cast it's result to an unsigned char.  This should ensure we
do the right thing when accessing an array with the result.
2008-03-28 04:58:51 +00:00
Georg Brandl d5b635f196 Make Py3k warnings consistent w.r.t. punctuation; also respect the
EOL 80 limit and supply more alternatives in warning messages.
2008-03-25 08:29:14 +00:00
Eric Smith 9ff19b5434 Finished backporting PEP 3127, Integer Literal Support and Syntax.
Added 0b and 0o literals to tokenizer.
Modified PyOS_strtoul to support 0b and 0o inputs.
Modified PyLong_FromString to support guessing 0b and 0o inputs.
Renamed test_hexoct.py to test_int_literal.py and added binary tests.
Added upper and lower case 0b, 0O, and 0X tests to test_int_literal.py
2008-03-17 17:32:20 +00:00
Neal Norwitz c44af337ce Add assertion that we do not blow out newl 2008-01-27 17:10:29 +00:00
Christian Heimes 082c9b0267 Fixed bug #1915: Python compiles with --enable-unicode=no again. However several extension methods and modules do not work without unicode support. 2008-01-23 14:20:50 +00:00
Georg Brandl 898f1879e1 Add a "const" to make gcc happy. 2008-01-21 21:14:21 +00:00
Georg Brandl 38d1715b0d Issue #1882: when compiling code from a string, encoding cookies in the
second line of code were not always recognized correctly.
2008-01-21 18:35:49 +00:00
Georg Brandl 14404b68d8 Fix #1679: "0x" was taken as a valid integer literal.
Fixes the tokenizer, tokenize.py and int() to reject this.
Patches by Malte Helmert.
2008-01-19 19:27:05 +00:00
Christian Heimes 288e89acfc Added bytes and b'' as aliases for str and '' 2008-01-18 18:24:07 +00:00
Georg Brandl 76b30d1688 Fix #define ordering. 2008-01-07 18:41:34 +00:00
Georg Brandl dfe5dc8455 Make Python compile with --disable-unicode. 2008-01-07 18:16:36 +00:00
Amaury Forgeot d'Arc 6dae85f409 Warning "<> not supported in 3.x" should be enabled only when the -3 option is set. 2007-11-24 13:20:22 +00:00
Christian Heimes 02c9ab568d Fixed problems in the last commit. Filenames and line numbers weren't reported correctly.
Backquotes still don't report the correct file. The AST nodes only contain the line number but not the file name.
2007-11-23 12:12:02 +00:00
Christian Heimes 729ab15370 Applied patch #1754273 and #1754271 from Thomas Glee
The patches are adding deprecation warnings for back ticks and <>
2007-11-23 09:10:36 +00:00
Guido van Rossum 9fc1b96a19 Change a PyErr_Print() into a PyErr_Clear(),
per discussion in issue 1031213.
2007-10-15 15:54:11 +00:00
Martin v. Löwis a5136196bc Patch #1031213: Decode source line in SyntaxErrors back to its original
source encoding. Will backport to 2.5.
2007-09-04 14:19:28 +00:00
Andrew M. Kuchling 9b3a824097 Comment grammar 2006-10-06 18:51:55 +00:00
Neal Norwitz 71e05f1e0c Don't truncate if size_t is bigger than uint 2006-06-12 02:07:57 +00:00
Neal Norwitz d21a7fffb1 Patch #1357836:
Prevent an invalid memory read from test_coding in case the done flag is set.
In that case, the loop isn't entered.  I wonder if rather than setting
the done flag in the cases before the loop, if they should just exit early.

This code looks like it should be refactored.

Backport candidate (also the early break above if decoding_fgets fails)
2006-06-02 06:23:00 +00:00
Skip Montanaro a0b6338823 C++ compiler cleanup: cast signed to unsigned 2006-04-18 00:53:06 +00:00
Neal Norwitz 08062d6665 As discussed on python-dev, really fix the PyMem_*/PyObject_* memory API
mismatches.  At least I hope this fixes them all.

This reverts part of my change from yesterday that converted everything
in Parser/*.c to use PyObject_* API.  The encoding doesn't really need
to use PyMem_*, however, it uses new_string() which must return PyMem_*
for handling the result of PyOS_Readline() which returns PyMem_* memory.

If there were 2 versions of new_string() one that returned PyMem_*
for tokens and one that return PyObject_* for encodings that could
also fix this problem.  I'm not sure which version would be clearer.
This seems to fix both Guido's and Phillip's problems, so it's good enough
for now.  After this change, it would be good to review Parser/*.c
for consistent use of the 2 memory APIs.
2006-04-11 08:19:15 +00:00
Anthony Baxter 114900298e Fix the code in Parser/ to also compile with C++. This was mostly casts for
malloc/realloc type functions, as well as renaming one variable called 'new'
in tokensizer.c. Still lots more to be done, going to be checking in one
chunk at a time or the patch will be massively huge. Still compiles ok with
gcc.
2006-04-11 05:39:14 +00:00
Neal Norwitz 2c4e4f9839 SF patch #1467512, fix double free with triple quoted string in standard build.
This was the result of inconsistent use of PyMem_* and PyObject_* allocators.
By changing to use PyObject_* allocator almost everywhere, this removes
the inconsistency.
2006-04-10 06:42:25 +00:00
Tim Peters c9d78aa470 Years in the making.
objimpl.h, pymem.h:  Stop mapping PyMem_{Del, DEL} and PyMem_{Free, FREE}
to PyObject_{Free, FREE} in a release build.  They're aliases for the
system free() now.

_subprocess.c/sp_handle_dealloc():  Since the memory was originally
obtained via PyObject_NEW, it must be released via PyObject_FREE (or
_DEL).

pythonrun.c, tokenizer.c, parsermodule.c:  I lost count of the number of
PyObject vs PyMem mismatches in these -- it's like the specific
function called at each site was picked at random, sometimes even with
memory obtained via PyMem getting released via PyObject.  Changed most
to use PyObject uniformly, since the blobs allocated are predictably
small in most cases, and obmalloc is generally faster than system
mallocs then.

If extension modules in real life prove as sloppy as Python's front
end, we'll have to revert the objimpl.h + pymem.h part of this patch.
Note that no problems will show up in a debug build (all calls still go
thru obmalloc then). Problems will show up only in a release build, most
likely segfaults.
2006-03-26 23:27:58 +00:00
Neal Norwitz 2aa9a5dfdd Use macro versions instead of function versions when we already know the type.
This will hopefully get rid of some Coverity warnings, be a hint to
developers, and be marginally faster.

Some asserts were added when the type is currently known, but depends
on values from another function.
2006-03-20 01:53:23 +00:00
Thomas Wouters 7eaf2aaf48 Fix crashing bug in tokenizer, when tokenizing files with non-ASCII bytes
but without a specified encoding: decoding_fgets() (and decoding_feof()) can
return NULL and fiddle with the 'tok' struct, making tok->buf NULL. This is
okay in the other cases of calls to decoding_*(), it seems, but not in this
one.

This should get a test added, somewhere, but the testsuite doesn't seem to
test encoding anywhere (although plenty of tests use it.)

It seems to me that decoding errors in other places in the code (like at the
start of a token, instead of in the middle of one) make the code end up
adding small integers to NULL pointers, but happen to check for error states
before using the calculated new pointers. I haven't been able to trigger any
other crashes, in any case.

I would nominate this file for a comlete rewrite for Py3k. The whole
decoding trick is too bolted-on for my tastes.
2006-03-02 20:41:27 +00:00