``_fancy_replace()`` is no longer recursive. and a single call does a worst-case linear number of ratio() computations instead of quadratic. This renders toothless a universe of pathological cases. Some inputs may produce different output, but that's rare, and I didn't find a case where the final diff appeared to be of materially worse quality. To the contrary, by refusing to even consider synching on lines "far apart", there was more easy-to-digest locality in the output.
Track all pairs achieving the best ratio in Differ(). This repairs the "very deep recursion and cubic time" bad cases in a way that preserves previous output.
Code from https://github.com/pulkin, in PR
https://github.com/python/cpython/pull/119131
Greatly speeds `Differ` when there are many identically scoring pairs, by splitting the recursion near the inputs' midpoints instead of degenerating (as now) into just peeling off the first two lines.
Co-authored-by: Tim Peters <tim.peters@gmail.com>
* "Return true/false" is replaced with "Return ``True``/``False``"
if the function actually returns a bool.
* Fixed formatting of some True and False literals (now in monospace).
* Replaced "True/False" with "true/false" if it can be not only bool.
* Replaced some 1/0 with True/False if it corresponds the code.
* "Returns <bool>" is replaced with "Return <bool>".
* Prevent low-grade poplib REDOS (CVE-2018-1060)
The regex to test a mail server's timestamp is susceptible to
catastrophic backtracking on long evil responses from the server.
Happily, the maximum length of malicious inputs is 2K thanks
to a limit introduced in the fix for CVE-2013-1752.
A 2KB evil response from the mail server would result in small slowdowns
(milliseconds vs. microseconds) accumulated over many apop calls.
This is a potential DOS vector via accumulated slowdowns.
Replace it with a similar non-vulnerable regex.
The new regex is RFC compliant.
The old regex was non-compliant in edge cases.
* Prevent difflib REDOS (CVE-2018-1061)
The default regex for IS_LINE_JUNK is susceptible to
catastrophic backtracking.
This is a potential DOS vector.
Replace it with an equivalent non-vulnerable regex.
Also introduce unit and REDOS tests for difflib.
Co-authored-by: Tim Peters <tim.peters@gmail.com>
Co-authored-by: Christian Heimes <christian@python.org>
This changes the main documentation, doc strings, source code comments, and a
couple error messages in the test suite. In some cases the word was removed
or edited some other way to fix the grammar.
Raise PendingDeprecationWarning when generator raises StopIteration
and no __future__ import is used. Fix offenders in the stdlib
and tests.
See also issue 22906.
Thanks to Nick Coghlan and Berker Peksag for reviews.
Some applications (e.g. traditional Unix diff, version control
systems) neither know nor care about the encodings of the files they
are comparing. They are textual, but to the diff utility they are just
bytes. This worked fine under Python 2, because all of the hardcoded
strings in difflib.py are ASCII, so could safely be combined with
old-style u'' strings. But it stopped working in 3.x.
The solution is to use surrogate escapes for a lossless
bytes->str->bytes roundtrip. That means {unified,context}_diff() can
continue to just handle strings without worrying about bytes. Callers
who have to deal with bytes will need to change to using diff_bytes().
Use case: Mercurial's test runner uses difflib to compare current hg
output with known good output. But Mercurial's output is just bytes,
since it can contain:
* file contents (arbitrary unknown encoding)
* filenames (arbitrary unknown encoding)
* usernames and commit messages (usually UTF-8, but not guaranteed
because old versions of Mercurial did not enforce it)
* user messages (locale encoding)
Since the output of any given hg command can include text in multiple
encodings, it is hopeless to try to treat it as decodable Unicode
text. It's just bytes, all the way down.
This is an elaboration of a patch by Terry Reedy.
requires them. Disable executable bits and shebang lines in test and
benchmark files in order to prevent using a random system python, and in
source files of modules which don't provide command line interface. Fixed
shebang lines in the unittestgui and checkpip scripts.
requires them. Disable executable bits and shebang lines in test and
benchmark files in order to prevent using a random system python, and in
source files of modules which don't provide command line interface. Fixed
shebang line to use python3 executable in the unittestgui script.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r80004 | r.david.murray | 2010-04-12 12:35:19 -0400 (Mon, 12 Apr 2010) | 13 lines
Issue #7585: use tab between components in unified and context diff headers.
Instead of spaces between the filename and date (or whatever the string
is that follows the filename, if any) use tabs. This is what the unix
'diff' command does, for example, and difflib was intended to follow
the 'standard' way of doing diffs. This improves compatibility with
patch tools. The docs and examples are also changed to recommended that
the date format used be the ISO 8601 format, which is what modern diff
tools emit by default.
Patch by Anatoly Techtonik.
........