bpo-36443, bpo-36202: Since Python 3.7.0, calling Py_DecodeLocale()
before Py_Initialize() produces mojibake if the LC_CTYPE locale is
coerced and/or if the UTF-8 Mode is enabled by the user
configuration. This change fix the issue by disabling LC_CTYPE
coercion and UTF-8 Mode by default. They must now be enabled
explicitly (opt-in) using the new _Py_PreInitialize() API with
_PyPreConfig.
When embedding Python, set coerce_c_locale and utf8_mode attributes
of _PyPreConfig to -1 to enable automatically these parameters
depending on the LC_CTYPE locale, environment variables and command
line arguments
Alternative: Setting Py_UTF8Mode to 1 always explicitly enables the
UTF-8 Mode.
Changes:
* _PyPreConfig_INIT now sets coerce_c_locale and utf8_mode to 0 by
default.
* _Py_InitializeFromArgs() and _Py_InitializeFromWideArgs() can now
be called with config=NULL.
Now that the parser generator is written in Python (Parser/pgen) we can make use of it to regenerate the Lib/keyword file that contains the language keywords instead of parsing the autogenerated grammar files. This also allows checking in the CI that the autogenerated files are up to date.
Set type_attr to NULL after the assignment to stgdict->proto (like
what is done with stgdict after the Py_SETREF() call) so that it is
not decrefed twice on error.
Before, an `AttributeError` was raised due to trying to access an attribute that exists on specs but having received `None` instead for a non-existent module.
https://bugs.python.org/issue36298
bpo-36256: Fix bug in parsermodule when parsing if statements
In the parser module, when validating nodes before starting the parsing with to create a ST in "parser_newstobject" there is a problem that appears when two arcs in the same DFA state has transitions with labels with the same type. For example, the DFA for if_stmt has a state with
two labels with the same type: "elif" and "else" (type NAME). The algorithm tries one by one the arcs until the label that starts the arc transition has a label with the same type of the current child label we are trying to accept. In this case, the arc for "elif" comes before the arc for "else"and passes this test (because the current child label is "else" and has the same type as "elif"). This lead to expecting a namedexpr_test (305) instead of a colon (11). The solution is to compare also the string representation (in case there is one) of the labels to see if the transition that we have is the correct one.
No longer limit repr(structseq) to 512 bytes. Use _PyUnicodeWriter
for better performance and to write directly Unicode rather than
encoding repr() value to UTF-8 and then decoding from UTF-8.
At Python initialization, the current directory is no longer
prepended to sys.path if it has been removed.
Rename _PyPathConfig_ComputeArgv0() to
_PyPathConfig_ComputeSysPath0() to avoid confusion between argv[0]
and sys.path[0].
Python initialization now fails if decoding pybuilddir.txt
configuration file fails at startup.
_PyPathConfig_Calculate() now reports memory allocation failure and
decoding error on decoding pybuilddir.txt content from
UTF-8/surrogateescape.
* bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool
* Use self-pipe pattern to avoid polling for changes
* Refactor some variable names and add comments
* Restore timeout and poll
* Use reader object only on wait()
* Recompute worker sentinels every time
* Remove timeout and use change notifier
* Refactor some methods to be overloaded by the ThreadPool, document the cache class and fix typos
ProcessPoolExecutor workers will hold the return value of their last task in memory until the next task is received. Since the return value has already been propagated to the parent process's Future (or has been discarded by this point), the object can be safely released.
Fix CFLAGS in customize_compiler() of distutils.sysconfig: when the
CFLAGS environment variable is defined, don't override CFLAGS variable with
the OPT variable anymore.
Initial patch written by David Malcolm.
Co-Authored-By: David Malcolm <dmalcolm@redhat.com>
The value is a string for string and byte literals, None otherwise.
It is 'u' for u"..." literals, 'b' for b"..." literals, '' for "..." literals.
The 'r' (raw) prefix is ignored.
Does not apply to f-strings.
This appears sufficient to make mypy capable of using the stdlib ast module instead of typed_ast (assuming a mypy patch I'm working on).
WIP: I need to make the tests pass. @ilevkivskyi @serhiy-storchaka
https://bugs.python.org/issue36280
Fix an unlikely memory leak on conversion from string to float in the
function _Py_dg_strtod() used by float(str), complex(str),
pickle.load(), marshal.load(), etc.
Fix an unlikely memory leak in _Py_dg_strtod() on "undfl:" label:
rewrite memory management in this function to always release all
memory before exiting the function. Initialize variables to NULL, and
set them to NULL after calling Bfree() at the "cont:" label.
Note: Bfree(NULL) is well defined: it does nothing.
test_posix.PosixUidGidTests:
* Add tests for invalid uid/gid type (str)
* Add UID_OVERFLOW and GID_OVERFLOW constants to replace (1 << 32)
Initial patch written by David Malcolm.
Co-Authored-By: David Malcolm <dmalcolm@redhat.com>
* Refactor cookie path check as per RFC 6265
* Add tests for prefix match of path
* Add news entry
* Fix set_ok_path and refactor tests
* Use slice for last letter
Don't send cookies of domain A without Domain attribute to domain B when domain A is a suffix match of domain B while using a cookiejar with `http.cookiejar.DefaultCookiePolicy` policy. Patch by Karthikeyan Singaravelan.
This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.)
https://bugs.python.org/issue35975
Methods are always bound, and `__self__` can no longer be `NULL`
(`method_new()` and `PyMethod_New()` both explicitly check for this).
Moreover, once a bound method is bound, it *stays* bound and won't be re-bound
to something else, so the section in the datamodel that talks about accessing
an methods in a different descriptor-binding context doesn't apply any more in
Python 3.
Fix two unlikely reference leaks in _hashopenssl. The leaks only occur in
out-of-memory cases. Thanks to Charalampos Stratakis.
Signed-off-by: Christian Heimes <christian@python.org>
https://bugs.python.org/issue36179
For C++ extensions, distutils tries to replace the C compiler with the
C++ compiler, but it assumes that C compiler is the first element after
any environment variables set. On AIX, linking goes through ld_so_aix,
so it is the first element and the compiler is the next element. Thus
the replacement is faulty:
ld_so_aix gcc ... -> g++ gcc ...
Also, it assumed that self.compiler_cxx had only 1 element or that
there were the same number of elements as the linker has and in the
same order. This might not be the case, so instead concatenate
everything together.
Use UTF-8 as the system encoding on VxWorks.
The main reason are:
1. The locale is frequently misconfigured.
2. Missing some functions to deal with locale in VxWorks C library.
It is changed from 16KiB to 64KiB. The previous default value
is used since 1990.
coreutils chose 128 KiB as minimum buffer size for block device I/O.
But shutil.copyfileobj() can be used for non block devices.
So I choose more conservative value.
As my quick benchmark, performance difference between 64KiB and
128 KiB is up to ~5%. On the other hand, performance difference
between 32 KiB and 64 KiB can be more than 10% when file is fully
buffered.
This is why 64 KiB is rational value.
Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser.
This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.
Add TEST_EXTENSIONS constant to setup.py to allow to not build test
extensions like _testcapi.
Changes:
* Add add_ldflags_cppflags() subfunction
* Rename add_compiler_directories() to configure_compiler().
* Remove unused COMPILED_WITH_PYDEBUG constant.
* Use self.add() rather than accessing directly self.extensions.
* Remove module_enabled() function: check differently if curses
extension is built or not.
The whole coreconfig.h header is now excluded from Py_LIMITED_API.
Move functions definitions into a new internal pycore_coreconfig.h
header.
* Move Include/coreconfig.h to Include/cpython/coreconfig.h
* coreconfig.h header is now excluded from Py_LIMITED_API
* Move functions to pycore_coreconfig.h
Use locale.getpreferredencoding() rather than locale.getlocale() to
get the locale encoding. With some locales, locale.getlocale()
returns the wrong encoding.
For example, on Fedora 29, locale.getlocale() returns ISO-8859-1
encoding for the "en_IN" locale, whereas
locale.getpreferredencoding() reports the correct encoding: UTF-8.