Commit Graph

30 Commits

Author SHA1 Message Date
Miss Islington (bot) 28c179094b bpo-33224: PEP 479 fix for difflib.mdiff() (GH-6381) (GH-6390) 2018-04-05 11:45:33 -07:00
Miss Islington (bot) 0902a2d6b2 bpo-32981: Fix catastrophic backtracking vulns (GH-5955)
* Prevent low-grade poplib REDOS (CVE-2018-1060)

The regex to test a mail server's timestamp is susceptible to
catastrophic backtracking on long evil responses from the server.

Happily, the maximum length of malicious inputs is 2K thanks
to a limit introduced in the fix for CVE-2013-1752.

A 2KB evil response from the mail server would result in small slowdowns
(milliseconds vs. microseconds) accumulated over many apop calls.
This is a potential DOS vector via accumulated slowdowns.

Replace it with a similar non-vulnerable regex.

The new regex is RFC compliant.
The old regex was non-compliant in edge cases.

* Prevent difflib REDOS (CVE-2018-1061)

The default regex for IS_LINE_JUNK is susceptible to
catastrophic backtracking.
This is a potential DOS vector.

Replace it with an equivalent non-vulnerable regex.

Also introduce unit and REDOS tests for difflib.

Co-authored-by: Tim Peters <tim.peters@gmail.com>
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Jamie Davis <davisjam@vt.edu>
(cherry picked from commit 0e6c8ee235)
2018-03-03 21:55:07 -08:00
Raymond Hettinger 15f44ab043 Issue #27895: Spelling fixes (Contributed by Ville Skyttä). 2016-08-30 10:47:49 -07:00
Greg Ward 4d9d2563f5 #17445: difflib: add diff_bytes(), to compare bytes rather than str
Some applications (e.g. traditional Unix diff, version control
systems) neither know nor care about the encodings of the files they
are comparing. They are textual, but to the diff utility they are just
bytes. This worked fine under Python 2, because all of the hardcoded
strings in difflib.py are ASCII, so could safely be combined with
old-style u'' strings. But it stopped working in 3.x.

The solution is to use surrogate escapes for a lossless
bytes->str->bytes roundtrip. That means {unified,context}_diff() can
continue to just handle strings without worrying about bytes. Callers
who have to deal with bytes will need to change to using diff_bytes().

Use case: Mercurial's test runner uses difflib to compare current hg
output with known good output. But Mercurial's output is just bytes,
since it can contain:
  * file contents (arbitrary unknown encoding)
  * filenames (arbitrary unknown encoding)
  * usernames and commit messages (usually UTF-8, but not guaranteed
    because old versions of Mercurial did not enforce it)
  * user messages (locale encoding)

Since the output of any given hg command can include text in multiple
encodings, it is hopeless to try to treat it as decodable Unicode
text. It's just bytes, all the way down.

This is an elaboration of a patch by Terry Reedy.
2015-04-20 20:21:21 -04:00
Berker Peksag 102029dfd6 Issue #2052: Add charset parameter to HtmlDiff.make_file(). 2015-03-15 01:18:47 +02:00
Raymond Hettinger fabefc3c5b Issue 21635: Fix caching in difflib.SequenceMatcher.get_matching_blocks(). 2014-06-21 11:57:36 -07:00
Raymond Hettinger 9180deb59c Issue 11747: Fix output format for context diffs. 2011-04-12 15:25:30 -07:00
Raymond Hettinger 49353d0e8f Issue #11747: Fix range formatting in context and unified diffs. 2011-04-11 12:40:58 -07:00
Terry Reedy 17a59252e8 Issue 10534, difflib: tweak doc; test new SequenceMatcher instance attributes; avoid unneeded lists of SM.b2j keys and items in .__chain_b. Do not backport. 2010-12-15 20:18:10 +00:00
Terry Reedy 99f9637de8 Issue 2986: Add autojunk paramater to SequenceMatcher to turn off heuristic. Patch by Terry Reedy, Eli Bendersky, and Simon Cross 2010-11-25 06:12:34 +00:00
R. David Murray b2416e54b1 Merged revisions 80004 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r80004 | r.david.murray | 2010-04-12 12:35:19 -0400 (Mon, 12 Apr 2010) | 13 lines

  Issue #7585: use tab between components in unified and context diff headers.

  Instead of spaces between the filename and date (or whatever the string
  is that follows the filename, if any) use tabs.  This is what the unix
  'diff' command does, for example, and difflib was intended to follow
  the 'standard' way of doing diffs.  This improves compatibility with
  patch tools.  The docs and examples are also changed to recommended that
  the date format used be the ISO 8601 format, which is what modern diff
  tools emit by default.

  Patch by Anatoly Techtonik.
........
2010-04-12 16:58:02 +00:00
Senthil Kumaran 758025cb1f Merged revisions 76464 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r76464 | senthil.kumaran | 2009-11-24 00:11:31 +0530 (Tue, 24 Nov 2009) | 4 lines

  Fix for issue1488943 - difflib.Differ() doesn't always add hints for tab
  characters.
........
2009-11-23 19:02:52 +00:00
Philip Jenvey a27c5bd2bd Merged revisions 72979 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r72979 | philip.jenvey | 2009-05-27 22:58:44 -0700 (Wed, 27 May 2009) | 2 lines

  explicitly close files
........
2009-05-28 06:09:08 +00:00
Benjamin Peterson ee8712cda4 #2621 rename test.test_support to test.support 2008-05-20 21:35:26 +00:00
Georg Brandl a18af4e7a2 PEP 3114: rename .next() to .__next__() and add next() builtin. 2007-04-21 15:47:16 +00:00
Thomas Wouters 49fd7fa443 Merge p3yk branch with the trunk up to revision 45595. This breaks a fair
number of tests, all because of the codecs/_multibytecodecs issue described
here (it's not a Py3K issue, just something Py3K discovers):
http://mail.python.org/pipermail/python-dev/2006-April/064051.html

Hye-Shik Chang promised to look for a fix, so no need to fix it here. The
tests that are expected to break are:

test_codecencodings_cn
test_codecencodings_hk
test_codecencodings_jp
test_codecencodings_kr
test_codecencodings_tw
test_codecs
test_multibytecodec

This merge fixes an actual test failure (test_weakref) in this branch,
though, so I believe merging is the right thing to do anyway.
2006-04-21 10:40:58 +00:00
Gustavo Niemeyer 548148810b Patch #1413711: Certain patterns of differences were making difflib
touch the recursion limit. The applied patch inlines the recursive
__helper method in a non-recursive way.
2006-01-31 18:34:13 +00:00
Tim Peters 48bd7f3a71 Whitespace normalization. test_difflib passes again. 2004-08-29 22:38:38 +00:00
Tim Peters afb5f94217 Reverting whitespace normalization. test_difflib fails with it -- the
test depends on invisible trailing whitespace in .py files.  The author will
have to repair that.
2004-08-29 19:33:36 +00:00
Tim Peters 45e77c55ff Whitespace normalization. 2004-08-29 18:47:31 +00:00
Martin v. Löwis e064b41f5a Patch #914575: difflib side by side diff support, diff.py s/b/s HTML option. 2004-08-29 16:34:40 +00:00
Brett Cannon d2c5b4b549 SequenceMatcher(None, [], []).get_grouped_opcodes() now returns a generator
that behaves as if both lists has an empty string in each of them.

Closes bug #979794 (and duplicate bug #980117).
2004-07-10 23:54:07 +00:00
Tim Peters 58eb11cf62 Whitespace normalization. 2004-01-18 20:29:55 +00:00
Raymond Hettinger 43d790c087 Exercise Jim Fulton's new doctest extension for running doctests in a
unittest environment.  Since his extension finds docstrings in private
functions, it exposed a bug in the difflib doctests.
2003-07-16 04:34:56 +00:00
Neal Norwitz e7dfe21bed Fix SF bug #763023, difflib.py: ratio() zero division not caught
Backport candidate
2003-07-01 14:59:46 +00:00
Barry Warsaw 04f357cffe Get rid of relative imports in all unittests. Now anything that
imports e.g. test_support must do so using an absolute package name
such as "import test.test_support" or "from test import test_support".

This also updates the README in Lib/test, and gets rid of the
duplicate data dirctory in Lib/test/data (replaced by
Lib/email/test/data).

Now Tim and Jack can have at it. :)
2002-07-23 19:04:11 +00:00
Tim Peters a0a6222509 Teach regrtest how to pass on doctest failure msgs. This is done via a
horridly inefficient hack in regrtest's Compare class, but it's about as
clean as can be:  regrtest has to set up the Compare instance before
importing a test module, and by the time the module *is* imported it's too
late to change that decision.  The good news is that the more tests we
convert to unittest and doctest, the less the inefficiency here matters.
Even now there are few tests with large expected-output files (the new
cost here is a Python-level call per .write() when there's an expected-
output file).
2001-09-09 06:12:01 +00:00
Tim Peters f5f6c436c6 Remove test_doctest's expected-output file.
Change test_doctest and test_difflib to pass regrtest's notion of
verbosity on to doctest.
Add explanation for a dozen "new" things to test/README.
2001-05-23 07:46:36 +00:00
Tim Peters dec4a6143c Remove test_difflib's output file and change test_difflib to stop
generating it.  Since this is purely a doctest, the output file never
served a good purpose.
2001-05-23 01:45:19 +00:00
Tim Peters 9ae2148ada Moved SequenceMatcher from ndiff into new std library module difflib.py.
Guido told me to do this <wink>.
Greatly expanded docstrings, and fleshed out with examples.
New std test.
Added new get_close_matches() function for ESR.
Needs docs, but LaTeXification of the module docstring is all it needs.
\CVS: ----------------------------------------------------------------------
2001-02-10 08:00:53 +00:00