Commit Graph

55 Commits

Author SHA1 Message Date
Ezio Melotti 7d24b1698a #16152: fix tokenize to ignore whitespace at the end of the code when no newline is found. Patch by Ned Batchelder. 2012-11-03 17:30:51 +02:00
Meador Inge 43f42fc3cb Issue #15054: Fix incorrect tokenization of 'b' and 'br' string literals.
Patch by Serhiy Storchaka.
2012-06-16 21:05:50 -05:00
Benjamin Peterson ca2d2529ce some cleanups 2009-10-15 03:05:39 +00:00
Benjamin Peterson 447dc15658 use floor division and add a test that exercises the tabsize codepath 2009-10-15 01:49:37 +00:00
Benjamin Peterson e537adfd08 pep8ify if blocks 2009-10-15 01:47:28 +00:00
Brett Cannon 50bb7e12ec Remove a tuple unpacking in a parameter list to remove a SyntaxWarning raised
while running under -3.
2008-08-02 03:15:20 +00:00
Benjamin Peterson 8456f64ce2 revert 63965 for preformance reasons 2008-06-05 23:02:33 +00:00
Benjamin Peterson 30dc7b8ce2 use the more idomatic while True 2008-06-05 22:39:34 +00:00
Amaury Forgeot d'Arc da0c025a43 Issue2495: tokenize.untokenize did not insert space between two consecutive string literals:
"" "" => """", which is invalid code.

Will backport
2008-03-27 23:23:54 +00:00
Eric Smith 0aed07ad80 Added PEP 3127 support to tokenize (with tests); added PEP 3127 to NEWS. 2008-03-17 19:43:40 +00:00
Georg Brandl 14404b68d8 Fix #1679: "0x" was taken as a valid integer literal.
Fixes the tokenizer, tokenize.py and int() to reject this.
Patches by Malte Helmert.
2008-01-19 19:27:05 +00:00
Christian Heimes 288e89acfc Added bytes and b'' as aliases for str and '' 2008-01-18 18:24:07 +00:00
Raymond Hettinger 8a7e76bcfa Add name to credits (for untokenize). 2006-12-02 02:00:39 +00:00
Jeremy Hylton 39c532c0b6 Replace dead code with an assert.
Now that COMMENT tokens are reliably followed by NL or NEWLINE,
there is never a need to add extra newlines in untokenize.
2006-08-23 21:26:46 +00:00
Jeremy Hylton 76467ba6d6 Bug fixes large and small for tokenize.
Small: Always generate a NL or NEWLINE token following
       a COMMENT token.  The old code did not generate an NL token if
       the comment was on a line by itself.

Large: The output of untokenize() will now match the
       input exactly if it is passed the full token sequence.  The
       old, crufty output is still generated if a limited input
       sequence is provided, where limited means that it does not
       include position information for tokens.

Remaining bug: There is no CONTINUATION token (\) so there is no way
for untokenize() to handle such code.

Also, expanded the number of doctests in hopes of eventually removing
the old-style tests that compare against a golden file.

Bug fix candidate for Python 2.5.1. (Sigh.)
2006-08-23 21:14:03 +00:00
Georg Brandl 2463f8f831 Make tabnanny recognize IndentationErrors raised by tokenize.
Add a test to test_inspect to make sure indented source
is recognized correctly. (fixes #1224621)
2006-08-14 21:34:08 +00:00
Guido van Rossum c259cc9c4c Insert a safety space after numbers as well as names in untokenize(). 2006-03-30 21:43:35 +00:00
Raymond Hettinger da99d1cbfe SF bug #1224621: tokenize module does not detect inconsistent dedents 2005-06-21 07:43:58 +00:00
Raymond Hettinger 68c0453418 Add untokenize() function to allow full round-trip tokenization.
Should significantly enhance the utility of the module by supporting
the creation of tools that modify the token stream and writeback the
modified result.
2005-06-10 11:05:19 +00:00
Anthony Baxter c2a5a63654 PEP-0318, @decorator-style. In Guido's words:
"@ seems the syntax that everybody can hate equally"
Implementation by Mark Russell, from SF #979728.
2004-08-02 06:10:11 +00:00
Guido van Rossum 68468eba63 Get rid of many apply() calls. 2003-02-27 20:14:51 +00:00
Raymond Hettinger 78a7aeeb1a SF 633560: tokenize.__all__ needs "generate_tokens" 2002-11-05 06:06:02 +00:00
Guido van Rossum 9d6897accc Speed up the most egregious "if token in (long tuple)" cases by using
a dict instead.  (Alas, using a Set would be slower instead of
faster.)
2002-08-24 06:54:19 +00:00
Tim Peters 8ac1495a6a Whitespace normalization. 2002-05-23 15:15:30 +00:00
Raymond Hettinger d1fa3db52d Added docstrings excerpted from Python Library Reference.
Closes patch 556161.
2002-05-15 02:56:03 +00:00
Tim Peters 496563a514 Remove some now-obsolete generator future statements.
I left the email pkg alone; I'm not sure how Barry would like to handle
that.
2002-04-01 00:28:59 +00:00
Neal Norwitz e98d16e8a4 Cleanup x so it is not left in module 2002-03-26 16:20:26 +00:00
Tim Peters d507dab91f SF patch #455966: Allow leading 0 in float/imag literals.
Consequences for Jython still unknown (but raised on Jython-Dev).
2001-08-30 20:51:59 +00:00
Guido van Rossum 96204f5e49 Add new tokens // and //=, in support of PEP 238. 2001-08-08 05:04:07 +00:00
Fred Drake 79e75e1916 Use string.ascii_letters instead of string.letters (SF bug #226706). 2001-07-20 19:05:50 +00:00
Guido van Rossum b09f7ed623 Preliminary support for "from __future__ import generators" to enable
the yield statement.  I figure we have to have this in before I can
release 2.2a1 on Wednesday.

Note: test_generators is currently broken, I'm counting on Tim to fix
this.
2001-07-15 21:08:29 +00:00
Tim Peters 4efb6e9643 Turns out Neil didn't intend for *all* of his gen-branch work to get
committed.

tokenize.py:  I like these changes, and have tested them extensively
without even realizing it, so I just updated the docstring and the docs.

tabnanny.py:  Also liked this, but did a little code fiddling.  I should
really rewrite this to *exploit* generators, but that's near the bottom
of my effort/benefit scale so doubt I'll get to it anytime soon (it
would be most useful as a non-trivial example of ideal use of generators;
but test_generators.py has already grown plenty of food-for-thought
examples).

inspect.py:  I'm sure Ping intended for this to continue running even
under 1.5.2, so I reverted this to the last pre-gen-branch version.  The
"bugfix" I checked in in-between was actually repairing a bug *introduced*
by the conversion to generators, so it's OK that the reverted version
doesn't reflect that checkin.
2001-06-29 23:51:08 +00:00
Tim Peters 5ca576ed0a Merging the gen-branch into the main line, at Guido's direction. Yay!
Bugfix candidate in inspect.py:  it was referencing "self" outside of
a method.
2001-06-18 22:08:13 +00:00
Ka-Ping Yee 28c62bbdb2 Provide a StopTokenizing exception for conveniently exiting the loop. 2001-03-23 05:22:49 +00:00
Ka-Ping Yee 4f64c13582 Better __credits__. 2001-03-01 17:11:17 +00:00
Ka-Ping Yee 244c593598 Add __author__ and __credits__ variables. 2001-03-01 13:56:40 +00:00
Skip Montanaro 40fc16059f final round of __all__ lists (I hope) - skipped urllib2 because Moshe may be
giving it a slight facelift
2001-03-01 04:27:19 +00:00
Eric S. Raymond b08b2d3166 String method conversion. 2001-02-09 11:10:16 +00:00
Ka-Ping Yee 1ff08b1243 Add tokenizer support and tests for u'', U"", uR'', Ur"", etc. 2001-01-15 22:04:30 +00:00
Tim Peters b90f89a496 Whitespace normalization. 2001-01-15 03:26:36 +00:00
Tim Peters de49583a0d Possible fix for Skip's bug 116136 (sre recursion limit hit in tokenize.py).
tokenize.py has always used naive regexps for matching string literals,
and that appears to trigger the sre recursion limit on Skip's platform (he
has very long single-line string literals).  Replaced all of tokenize.py's
string regexps with the "unrolled" forms used in IDLE, where they're known to
handle even absurd (multi-megabyte!) string literals without trouble.  See
Friedl's book for explanation (at heart, the naive regexps create a backtracking
choice point for each character in the literal, while the unrolled forms create
none).
2000-10-07 05:09:39 +00:00
Thomas Wouters e1519a1b4d Update for augmented assignment, tested & approved by Guido. 2000-08-24 21:44:52 +00:00
Fred Drake 9b8d801c37 Convert some old-style string exceptions to class exceptions. 2000-08-17 04:45:13 +00:00
Guido van Rossum a90c78b918 Differentiate between NEWLINE token (an official newline) and NL token
(a newline that the grammar ignores).
1998-04-03 16:05:38 +00:00
Guido van Rossum fefc922cef New, fixed version with proper r"..." and R"..." support from Ka-Ping. 1997-10-27 21:17:24 +00:00
Guido van Rossum 3b631775b2 Redone (by Ka-Ping) using the new re module, and adding recognition
for r"..." raw strings.  (And R"..." string support added by Guido.)
1997-10-27 20:44:15 +00:00
Guido van Rossum 2b1566be9d Correct typo in last line (test program invocation). 1997-06-03 22:05:15 +00:00
Guido van Rossum de65527e4b Ping's latest. Fixes triple quoted strings ending in odd
#backslashes, and other stuff I don't know.
1997-04-09 17:15:54 +00:00
Guido van Rossum 1aec32363f Ka-Ping's muich improved version of March 26, 1997:
#     Ignore now accepts \f as whitespace.  Operator now includes '**'.
#     Ignore and Special now accept \n or \r\n at the end of a line.
#     Imagnumber is new.  Expfloat is corrected to reject '0e4'.
1997-04-08 14:24:39 +00:00
Guido van Rossum b5dc5e3d7e Added support for imaginary constants (e.g. 0j, 1j, 1.0j). 1997-03-10 23:17:01 +00:00