cpython/Doc/whatsnew
Greg Ward 4d9d2563f5 #17445: difflib: add diff_bytes(), to compare bytes rather than str
Some applications (e.g. traditional Unix diff, version control
systems) neither know nor care about the encodings of the files they
are comparing. They are textual, but to the diff utility they are just
bytes. This worked fine under Python 2, because all of the hardcoded
strings in difflib.py are ASCII, so could safely be combined with
old-style u'' strings. But it stopped working in 3.x.

The solution is to use surrogate escapes for a lossless
bytes->str->bytes roundtrip. That means {unified,context}_diff() can
continue to just handle strings without worrying about bytes. Callers
who have to deal with bytes will need to change to using diff_bytes().

Use case: Mercurial's test runner uses difflib to compare current hg
output with known good output. But Mercurial's output is just bytes,
since it can contain:
  * file contents (arbitrary unknown encoding)
  * filenames (arbitrary unknown encoding)
  * usernames and commit messages (usually UTF-8, but not guaranteed
    because old versions of Mercurial did not enforce it)
  * user messages (locale encoding)

Since the output of any given hg command can include text in multiple
encodings, it is hopeless to try to treat it as decodable Unicode
text. It's just bytes, all the way down.

This is an elaboration of a patch by Terry Reedy.
2015-04-20 20:21:21 -04:00
..
2.0.rst Fixing broken links in doc, part 4: some more breaks and redirects 2014-10-29 10:57:37 +01:00
2.1.rst Merge with 3.4 2014-10-29 08:37:29 +01:00
2.2.rst Use https:// URLs when referring to python.org hosts. 2014-10-29 08:36:35 +01:00
2.3.rst Fixing broken links in doc, part 4: some more breaks and redirects 2014-10-29 10:57:37 +01:00
2.4.rst Use https:// URLs when referring to python.org hosts. 2014-10-29 08:36:35 +01:00
2.5.rst Use https:// URLs when referring to python.org hosts. 2014-10-29 08:36:35 +01:00
2.6.rst Fixing broken links in doc, part 4: some more breaks and redirects 2014-10-29 10:57:37 +01:00
2.7.rst Fixing broken links in doc, part 4: some more breaks and redirects 2014-10-29 10:57:37 +01:00
3.0.rst Doc: fix default role usage (except in unittest mock docs) 2014-10-30 22:26:26 +01:00
3.1.rst Fixing broken links in doc, part 4: some more breaks and redirects 2014-10-29 10:57:37 +01:00
3.2.rst Fixing broken links in doc, part 4: some more breaks and redirects 2014-10-29 10:57:37 +01:00
3.3.rst Issue #23181: More "codepoint" -> "code point". 2015-01-18 11:28:37 +02:00
3.4.rst PEP 476: enable HTTPS certificate verification by default (#22417) 2014-11-03 14:29:33 -05:00
3.5.rst #17445: difflib: add diff_bytes(), to compare bytes rather than str 2015-04-20 20:21:21 -04:00
changelog.rst Fixing broken links in doc, part 4: some more breaks and redirects 2014-10-29 10:57:37 +01:00
index.rst Doc: add What's New in Python 3.5 to whatsnew index 2014-03-18 09:01:21 +01:00