#14332: provide a better explanation of junk in difflib docs
Initial patch by Alba Magallanes.
This commit is contained in:
parent
2e3743cd30
commit
c51da2b8a0
|
@ -27,7 +27,9 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
|
|||
little fancier than, an algorithm published in the late 1980's by Ratcliff and
|
||||
Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
|
||||
find the longest contiguous matching subsequence that contains no "junk"
|
||||
elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same
|
||||
elements; these "junk" elements are ones that are uninteresting in some
|
||||
sense, such as blank lines or whitespace. (Handling junk is an
|
||||
extension to the Ratcliff and Obershelp algorithm.) The same
|
||||
idea is then applied recursively to the pieces of the sequences to the left and
|
||||
to the right of the matching subsequence. This does not yield minimal edit
|
||||
sequences, but does tend to yield matches that "look right" to people.
|
||||
|
@ -210,7 +212,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
|
|||
Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
|
||||
delta (a :term:`generator` generating the delta lines).
|
||||
|
||||
Optional keyword parameters *linejunk* and *charjunk* are for filter functions
|
||||
Optional keyword parameters *linejunk* and *charjunk* are filtering functions
|
||||
(or ``None``):
|
||||
|
||||
*linejunk*: A function that accepts a single string argument, and returns
|
||||
|
@ -224,7 +226,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
|
|||
*charjunk*: A function that accepts a character (a string of length 1), and
|
||||
returns if the character is junk, or false if not. The default is module-level
|
||||
function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
|
||||
blank or tab; note: bad idea to include newline in this!).
|
||||
blank or tab; it's a bad idea to include newline in this!).
|
||||
|
||||
:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
|
||||
|
||||
|
@ -624,6 +626,12 @@ The :class:`Differ` class has this constructor:
|
|||
length 1), and returns true if the character is junk. The default is ``None``,
|
||||
meaning that no character is considered junk.
|
||||
|
||||
These junk-filtering functions speed up matching to find
|
||||
differences and do not cause any differing lines or characters to
|
||||
be ignored. Read the description of the
|
||||
:meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
|
||||
parameter for an explanation.
|
||||
|
||||
:class:`Differ` objects are used (deltas generated) via a single method:
|
||||
|
||||
|
||||
|
|
|
@ -853,10 +853,9 @@ class Differ:
|
|||
and return true iff the string is junk. The module-level function
|
||||
`IS_LINE_JUNK` may be used to filter out lines without visible
|
||||
characters, except for at most one splat ('#'). It is recommended
|
||||
to leave linejunk None; as of Python 2.3, the underlying
|
||||
SequenceMatcher class has grown an adaptive notion of "noise" lines
|
||||
that's better than any static definition the author has ever been
|
||||
able to craft.
|
||||
to leave linejunk None; the underlying SequenceMatcher class has
|
||||
an adaptive notion of "noise" lines that's better than any static
|
||||
definition the author has ever been able to craft.
|
||||
|
||||
- `charjunk`: A function that should accept a string of length 1. The
|
||||
module-level function `IS_CHARACTER_JUNK` may be used to filter out
|
||||
|
@ -1299,17 +1298,18 @@ def ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK):
|
|||
Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
|
||||
|
||||
Optional keyword parameters `linejunk` and `charjunk` are for filter
|
||||
functions (or None):
|
||||
functions, or can be None:
|
||||
|
||||
- linejunk: A function that should accept a single string argument, and
|
||||
- linejunk: A function that should accept a single string argument and
|
||||
return true iff the string is junk. The default is None, and is
|
||||
recommended; as of Python 2.3, an adaptive notion of "noise" lines is
|
||||
used that does a good job on its own.
|
||||
recommended; the underlying SequenceMatcher class has an adaptive
|
||||
notion of "noise" lines.
|
||||
|
||||
- charjunk: A function that should accept a string of length 1. The
|
||||
default is module-level function IS_CHARACTER_JUNK, which filters out
|
||||
whitespace characters (a blank or tab; note: bad idea to include newline
|
||||
in this!).
|
||||
- charjunk: A function that accepts a character (string of length
|
||||
1), and returns true iff the character is junk. The default is
|
||||
the module-level function IS_CHARACTER_JUNK, which filters out
|
||||
whitespace characters (a blank or tab; note: it's a bad idea to
|
||||
include newline in this!).
|
||||
|
||||
Tools/scripts/ndiff.py is a command-line front-end to this function.
|
||||
|
||||
|
@ -1680,7 +1680,7 @@ class HtmlDiff(object):
|
|||
tabsize -- tab stop spacing, defaults to 8.
|
||||
wrapcolumn -- column number where lines are broken and wrapped,
|
||||
defaults to None where lines are not wrapped.
|
||||
linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
|
||||
linejunk,charjunk -- keyword arguments passed into ndiff() (used by
|
||||
HtmlDiff() to generate the side by side HTML differences). See
|
||||
ndiff() documentation for argument default values and descriptions.
|
||||
"""
|
||||
|
|
Loading…
Reference in New Issue