Cut tricky `goto` that isn't needed, in _PyBytes_DecodeEscape. (GH-15825)

This is the sort of `goto` that requires the reader to stare hard at
the code to unpick what it's doing.

On doing so, the answer is... not very much!

* It jumps from the bottom of the loop to almost the top; the effect
  is to bypass the loop condition `s < end` and also the
  `if`-condition `*s != '\\'`, acting as if both are true.

* We've just decremented `s`, after incrementing it in the `switch`
  condition.  So it has the same value as when `s == end` failed.
  Before that was another increment... and before that we had
  `s < end`.  So `s < end` true, then increment, then `s == end`
  false... that means `s < end` is still true.

* Also this means `s` points to the same character as it did for the
  `switch` condition.  And there was a `case '\\'`, which we didn't
  hit -- so `*s != '\\'` is also true.

* That means this has no effect on the behavior!  The most it might do
  is an optimization -- we get to skip those two checks, because (as
  just proven above) we know they're true.

* But gosh, this is the *invalid escape sequence* path.  This does not
  seem like the kind of code path that calls for extreme optimization
  tricks.

So, take the `goto` and the label out.

Perhaps the compiler will notice the exact same facts we showed above,
and generate identical code.  Or perhaps it won't!  That'll be OK.

But then, crucially, if some future edit to this loop causes the
reasoning above to *stop* holding true... the compiler will adjust
this jump accordingly.  One of us fallible humans might not.
This commit is contained in:
Greg Price 2019-09-10 01:51:04 -07:00 committed by Benjamin Peterson
parent 2fc1160a80
commit 0711642eec
1 changed files with 0 additions and 3 deletions

View File

@ -1142,7 +1142,6 @@ PyObject *_PyBytes_DecodeEscape(const char *s,
end = s + len;
while (s < end) {
if (*s != '\\') {
non_esc:
if (!(recode_encoding && (*s & 0x80))) {
*p++ = *s++;
}
@ -1229,8 +1228,6 @@ PyObject *_PyBytes_DecodeEscape(const char *s,
}
*p++ = '\\';
s--;
goto non_esc; /* an arbitrary number of unescaped
UTF-8 bytes may follow. */
}
}