#14332: provide a better explanation of junk in difflib docs

Initial patch by Alba Magallanes.
This commit is contained in:
Andrew Kuchling 2014-03-19 16:43:06 -04:00
parent 2e3743cd30
commit c51da2b8a0
2 changed files with 24 additions and 16 deletions

View File

@ -27,7 +27,9 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
little fancier than, an algorithm published in the late 1980's by Ratcliff and
Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
find the longest contiguous matching subsequence that contains no "junk"
elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same
elements; these "junk" elements are ones that are uninteresting in some
sense, such as blank lines or whitespace. (Handling junk is an
extension to the Ratcliff and Obershelp algorithm.) The same
idea is then applied recursively to the pieces of the sequences to the left and
to the right of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that "look right" to people.
@ -210,7 +212,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
delta (a :term:`generator` generating the delta lines).
Optional keyword parameters *linejunk* and *charjunk* are for filter functions
Optional keyword parameters *linejunk* and *charjunk* are filtering functions
(or ``None``):
*linejunk*: A function that accepts a single string argument, and returns
@ -224,7 +226,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
*charjunk*: A function that accepts a character (a string of length 1), and
returns if the character is junk, or false if not. The default is module-level
function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
blank or tab; note: bad idea to include newline in this!).
blank or tab; it's a bad idea to include newline in this!).
:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
@ -624,6 +626,12 @@ The :class:`Differ` class has this constructor:
length 1), and returns true if the character is junk. The default is ``None``,
meaning that no character is considered junk.
These junk-filtering functions speed up matching to find
differences and do not cause any differing lines or characters to
be ignored. Read the description of the
:meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
parameter for an explanation.
:class:`Differ` objects are used (deltas generated) via a single method:

View File

@ -853,10 +853,9 @@ class Differ:
and return true iff the string is junk. The module-level function
`IS_LINE_JUNK` may be used to filter out lines without visible
characters, except for at most one splat ('#'). It is recommended
to leave linejunk None; as of Python 2.3, the underlying
SequenceMatcher class has grown an adaptive notion of "noise" lines
that's better than any static definition the author has ever been
able to craft.
to leave linejunk None; the underlying SequenceMatcher class has
an adaptive notion of "noise" lines that's better than any static
definition the author has ever been able to craft.
- `charjunk`: A function that should accept a string of length 1. The
module-level function `IS_CHARACTER_JUNK` may be used to filter out
@ -1299,17 +1298,18 @@ def ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK):
Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
Optional keyword parameters `linejunk` and `charjunk` are for filter
functions (or None):
functions, or can be None:
- linejunk: A function that should accept a single string argument, and
- linejunk: A function that should accept a single string argument and
return true iff the string is junk. The default is None, and is
recommended; as of Python 2.3, an adaptive notion of "noise" lines is
used that does a good job on its own.
recommended; the underlying SequenceMatcher class has an adaptive
notion of "noise" lines.
- charjunk: A function that should accept a string of length 1. The
default is module-level function IS_CHARACTER_JUNK, which filters out
whitespace characters (a blank or tab; note: bad idea to include newline
in this!).
- charjunk: A function that accepts a character (string of length
1), and returns true iff the character is junk. The default is
the module-level function IS_CHARACTER_JUNK, which filters out
whitespace characters (a blank or tab; note: it's a bad idea to
include newline in this!).
Tools/scripts/ndiff.py is a command-line front-end to this function.
@ -1680,7 +1680,7 @@ class HtmlDiff(object):
tabsize -- tab stop spacing, defaults to 8.
wrapcolumn -- column number where lines are broken and wrapped,
defaults to None where lines are not wrapped.
linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
linejunk,charjunk -- keyword arguments passed into ndiff() (used by
HtmlDiff() to generate the side by side HTML differences). See
ndiff() documentation for argument default values and descriptions.
"""