Commit Graph

97 Commits

Author SHA1 Message Date
Tim Peters de19694cfb
gh-119105: Differ.compare is too slow [for degenerate cases] (#119492)
``_fancy_replace()`` is no longer recursive. and a single call does a worst-case linear number of ratio() computations instead of quadratic. This renders toothless a universe of pathological cases. Some inputs may produce different output, but that's rare, and I didn't find a case where the final diff appeared to be of materially worse quality. To the contrary, by refusing to even consider synching on lines "far apart", there was more easy-to-digest locality in the output.
2024-05-24 22:08:21 -05:00
Tim Peters 07df93de73
gh-119105: difflib.py Differ.compare is too slow [for degenerate cases] (#119376)
Track all pairs achieving the best ratio in Differ(). This repairs the "very deep recursion and cubic time" bad cases in a way that preserves previous output.
2024-05-22 18:25:08 -05:00
pulkin 0abf997e75
gh-119105: difflib: improve recursion for degenerate cases (#119131)
Code from https://github.com/pulkin, in PR
https://github.com/python/cpython/pull/119131

Greatly speeds `Differ` when there are many identically scoring pairs, by splitting the recursion near the inputs' midpoints instead of degenerating (as now) into just peeling off the first two lines.

Co-authored-by: Tim Peters <tim.peters@gmail.com>
2024-05-19 16:46:37 -05:00
Simon de Vlieger c6b84a727c
Correct method name typo (#91970) 2022-04-27 15:28:56 -06:00
Christian Clauss 745c9d9dfc
Fix typos in the Lib directory (GH-28775)
Fix typos in the Lib directory as identified by codespell.

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2021-10-06 16:13:48 -07:00
Nikita Sobolev 06e1773c8d
bpo-45216: Remove extraneous method docs from `difflib` (GH-28445) 2021-09-21 23:31:12 +02:00
lrjball 3209cbd99b
bpo-40394 - difflib.SequenceMatched.find_longest_match default args (GH-19742)
* bpo-40394 - difflib.SequenceMatched.find_longest_match default args

Added default args to find_longest_match, as well as related tests.
2020-04-29 22:42:45 -05:00
Ethan Smith e3ec44d692
bpo-39481: PEP 585 for difflib, filecmp, fileinput (#19422) 2020-04-09 21:47:31 -07:00
Serhiy Storchaka 138ccbb022
bpo-38738: Fix formatting of True and False. (GH-17083)
* "Return true/false" is replaced with "Return ``True``/``False``"
  if the function actually returns a bool.
* Fixed formatting of some True and False literals (now in monospace).
* Replaced "True/False" with "true/false" if it can be not only bool.
* Replaced some 1/0 with True/False if it corresponds the code.
* "Returns <bool>" is replaced with "Return <bool>".
2019-11-12 16:57:03 +02:00
Anthony Sottile e1c638da6a Fix difflib `?` hint in diff output when dealing with tabs (#15201) 2019-08-21 13:59:25 -05:00
Serhiy Storchaka 830ddc74c4
Revert "bpo-35603: Escape table header of make_table output that can cause potential XSS. (GH-11341)" (GH-11356)
This reverts commit 78de01198b.
2019-01-02 14:49:25 +02:00
Xtreak 78de01198b bpo-35603: Escape table header of make_table output that can cause potential XSS. (GH-11341) 2018-12-29 10:53:14 +02:00
Raymond Hettinger 01b731fc2b
bpo-33224: PEP 479 fix for difflib.mdiff() (GH-6381) 2018-04-05 11:19:57 -07:00
Jamie Davis 0e6c8ee235 bpo-32981: Fix catastrophic backtracking vulns (#5955)
* Prevent low-grade poplib REDOS (CVE-2018-1060)

The regex to test a mail server's timestamp is susceptible to
catastrophic backtracking on long evil responses from the server.

Happily, the maximum length of malicious inputs is 2K thanks
to a limit introduced in the fix for CVE-2013-1752.

A 2KB evil response from the mail server would result in small slowdowns
(milliseconds vs. microseconds) accumulated over many apop calls.
This is a potential DOS vector via accumulated slowdowns.

Replace it with a similar non-vulnerable regex.

The new regex is RFC compliant.
The old regex was non-compliant in edge cases.

* Prevent difflib REDOS (CVE-2018-1061)

The default regex for IS_LINE_JUNK is susceptible to
catastrophic backtracking.
This is a potential DOS vector.

Replace it with an equivalent non-vulnerable regex.

Also introduce unit and REDOS tests for difflib.

Co-authored-by: Tim Peters <tim.peters@gmail.com>
Co-authored-by: Christian Heimes <christian@python.org>
2018-03-03 21:33:32 -08:00
Serhiy Storchaka 5affd23e6f bpo-29762: More use "raise from None". (#569)
This hides unwanted implementation details from tracebacks.
2017-04-05 09:37:24 +03:00
R David Murray 44b548dda8 #27364: fix "incorrect" uses of escape character in the stdlib.
And most of the tools.

Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
2016-09-08 13:59:53 -04:00
Martin Panter 2eb819f7a8 Issue #25523: Merge "a" to "an" fixes from 3.4 into 3.5 2015-11-02 04:04:57 +00:00
Martin Panter 7462b64911 Issue #25523: Correct "a" article to "an" article
This changes the main documentation, doc strings, source code comments, and a
couple error messages in the test suite. In some cases the word was removed
or edited some other way to fix the grammar.
2015-11-02 03:37:02 +00:00
Yury Selivanov 683333955a Issue 24237: Raise PendingDeprecationWarning per PEP 479
Raise PendingDeprecationWarning when generator raises StopIteration
and no __future__ import is used.  Fix offenders in the stdlib
and tests.

See also issue 22906.
Thanks to Nick Coghlan and Berker Peksag for reviews.
2015-05-22 11:16:47 -04:00
Yury Selivanov 8170e8c0d1 PEP 479: Change StopIteration handling inside generators.
Closes issue #22906.
2015-05-09 11:44:30 -04:00
Greg Ward 4d9d2563f5 #17445: difflib: add diff_bytes(), to compare bytes rather than str
Some applications (e.g. traditional Unix diff, version control
systems) neither know nor care about the encodings of the files they
are comparing. They are textual, but to the diff utility they are just
bytes. This worked fine under Python 2, because all of the hardcoded
strings in difflib.py are ASCII, so could safely be combined with
old-style u'' strings. But it stopped working in 3.x.

The solution is to use surrogate escapes for a lossless
bytes->str->bytes roundtrip. That means {unified,context}_diff() can
continue to just handle strings without worrying about bytes. Callers
who have to deal with bytes will need to change to using diff_bytes().

Use case: Mercurial's test runner uses difflib to compare current hg
output with known good output. But Mercurial's output is just bytes,
since it can contain:
  * file contents (arbitrary unknown encoding)
  * filenames (arbitrary unknown encoding)
  * usernames and commit messages (usually UTF-8, but not guaranteed
    because old versions of Mercurial did not enforce it)
  * user messages (locale encoding)

Since the output of any given hg command can include text in multiple
encodings, it is hopeless to try to treat it as decodable Unicode
text. It's just bytes, all the way down.

This is an elaboration of a patch by Terry Reedy.
2015-04-20 20:21:21 -04:00
Berker Peksag 102029dfd6 Issue #2052: Add charset parameter to HtmlDiff.make_file(). 2015-03-15 01:18:47 +02:00
Raymond Hettinger bbeac6ebd8 Use two-argument form of next() and use a return-statement instead of an explicit raise StopIteration 2014-08-03 22:49:07 -07:00
Raymond Hettinger ae39fbdd84 Make the import private to keep the global namespace clean. 2014-08-03 22:40:59 -07:00
Raymond Hettinger f25a38e568 Use reversed() instead of creating a new temporary list. 2014-08-03 22:36:32 -07:00
Raymond Hettinger 986efa074e merge 2014-06-21 11:59:46 -07:00
Raymond Hettinger fabefc3c5b Issue 21635: Fix caching in difflib.SequenceMatcher.get_matching_blocks(). 2014-06-21 11:57:36 -07:00
Victor Stinner 03ce1c013d (Merge 3.4) Issue #20976: pyflakes: Remove unused imports 2014-03-20 09:22:39 +01:00
Victor Stinner 7fa767e517 Issue #20976: pyflakes: Remove unused imports 2014-03-20 09:16:38 +01:00
Andrew Kuchling c51da2b8a0 #14332: provide a better explanation of junk in difflib docs
Initial patch by Alba Magallanes.
2014-03-19 16:43:06 -04:00
Serhiy Storchaka 8f8ec92de8 Issue #19936: Added executable bits or shebang lines to Python scripts which
requires them.  Disable executable bits and shebang lines in test and
benchmark files in order to prevent using a random system python, and in
source files of modules which don't provide command line interface.  Fixed
shebang lines in the unittestgui and checkpip scripts.
2014-01-16 17:33:23 +02:00
Serhiy Storchaka b992a0e102 Issue #19936: Added executable bits or shebang lines to Python scripts which
requires them.  Disable executable bits and shebang lines in test and
benchmark files in order to prevent using a random system python, and in
source files of modules which don't provide command line interface.  Fixed
shebang line to use python3 executable in the unittestgui script.
2014-01-16 17:15:49 +02:00
Ezio Melotti 9a3777e525 #18705: merge with 3.3. 2013-08-17 15:53:55 +03:00
Ezio Melotti 30b9d5d3af #18705: fix a number of typos. Patch by Févry Thibault. 2013-08-17 15:50:46 +03:00
Terry Jan Reedy f027a204b0 Issue #13248: removed deprecated and undocumented difflib.isbjunk, isbpopular. 2013-03-19 19:44:04 -04:00
Philip Jenvey 4993cc0a5b utilize yield from 2012-10-01 12:53:43 -07:00
Florent Xicluna 7f1c15b854 Fix comment in difflib. 2011-12-10 13:02:17 +01:00
Ezio Melotti d8b509b192 #13012: use splitlines(keepends=True/False) instead of splitlines(0/1). 2011-09-28 17:37:55 +03:00
Raymond Hettinger 9180deb59c Issue 11747: Fix output format for context diffs. 2011-04-12 15:25:30 -07:00
Raymond Hettinger 49353d0e8f Issue #11747: Fix range formatting in context and unified diffs. 2011-04-11 12:40:58 -07:00
Raymond Hettinger 47e120e70c Cleanup and modernize code prior to working on Issue 11747. 2011-04-10 17:14:56 -07:00
Ezio Melotti 3b3499ba69 #11565: Merge with 3.1. 2011-03-16 11:35:38 +02:00
Ezio Melotti 13925008dc #11565: Fix several typos. Patch by Piotr Kasprzyk. 2011-03-16 11:05:33 +02:00
Terry Reedy 17a59252e8 Issue 10534, difflib: tweak doc; test new SequenceMatcher instance attributes; avoid unneeded lists of SM.b2j keys and items in .__chain_b. Do not backport. 2010-12-15 20:18:10 +00:00
Terry Reedy bcd8988a12 Issue 10534 deprecate isbjunk and isbpopular methods.
Will add gone in 3.3 test later.
2010-12-03 22:29:40 +00:00
Terry Reedy 74a7c67db1 2010-12-03 18:57:42 +00:00
Terry Reedy 99f9637de8 Issue 2986: Add autojunk paramater to SequenceMatcher to turn off heuristic. Patch by Terry Reedy, Eli Bendersky, and Simon Cross 2010-11-25 06:12:34 +00:00
R. David Murray b2416e54b1 Merged revisions 80004 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r80004 | r.david.murray | 2010-04-12 12:35:19 -0400 (Mon, 12 Apr 2010) | 13 lines

  Issue #7585: use tab between components in unified and context diff headers.

  Instead of spaces between the filename and date (or whatever the string
  is that follows the filename, if any) use tabs.  This is what the unix
  'diff' command does, for example, and difflib was intended to follow
  the 'standard' way of doing diffs.  This improves compatibility with
  patch tools.  The docs and examples are also changed to recommended that
  the date format used be the ISO 8601 format, which is what modern diff
  tools emit by default.

  Patch by Anatoly Techtonik.
........
2010-04-12 16:58:02 +00:00
Benjamin Peterson 90f5ba538b convert shebang lines: python -> python3 2010-03-11 22:53:45 +00:00
Senthil Kumaran d884f8a9c4 Merged revisions 76469 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r76469 | senthil.kumaran | 2009-11-24 00:32:52 +0530 (Tue, 24 Nov 2009) | 10 lines

  Merged revisions 76464 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r76464 | senthil.kumaran | 2009-11-24 00:11:31 +0530 (Tue, 24 Nov 2009) | 4 lines

    Fix for issue1488943 - difflib.Differ() doesn't always add hints for tab
    characters.
  ........
................
2009-11-23 19:06:11 +00:00