This makes tokenizer.c:valid_utf8 match stringlib/codecs.h:decode_utf8.
It also fixes an off-by-one error introduced in 3.10 for the line number when the tokenizer reports bad UTF8.
Editors don't agree that `test_source_encoding.py` was valid koi8-r, making it
hard to edit that file without the editor breaking it in some way (see gh-96272).
Only one test actually relied on the koi8-r encoding and it was a duplicate of a
test from the deprecated `imp` module's `test_imp`, so here we replace
`test_pep263` with `test_import_encoded_module` stolen from `test_imp` and
set `test_source_encoding.py`'s encoding to utf-8 to make editing it easier
going forward.
The clearing of the temporary directory is not working on some platforms and
leaving behind files.
This has been updated to use the pattern in test_cmd_line.py [1] using the
special TESTFN rather than a test directory.
[1] https://github.com/python/cpython/blob/main/Lib/test/test_cmd_line.py#L559
When loading a source file from disk, there is a separate UTF-8 validator
distinct from the one in `unicode_decode_utf8`. This exercises that code path
with the same set of invalid inputs as we use for testing the "other" UTF-8
decoder.