bpo-28638: Optimize namedtuple() creation time by minimizing use of exec() (#3454)
* Working draft without _source * Re-use itemgetter() instances * Speed-up calls to __new__() with a pre-bound tuple.__new__() * Add note regarding string interning * Remove unnecessary create function wrappers * Minor sync-ups with PR-2736. Mostly formatting and f-strings * Bring-in qualname/__module fix-ups from PR-2736 * Formally remove the verbose flag and _source attribute * Restore a test of potentially problematic field names * Restore kwonly_args test but without the verbose option * Adopt Inada's idea to reuse the docstrings for the itemgetters * Neaten-up a bit * Add news blurb * Serhiy pointed-out the need for interning * Jelle noticed as missing f on an f-string * Add whatsnew entry for feature removal * Accede to request for dict literals instead keyword arguments * Leave the method.__module__ attribute pointing the actual location of the code * Improve variable names and add a micro-optimization for an non-public helper function * Simplify by in-lining reuse_itemgetter() * Arrange steps in more logical order * Save docstring in local cache instead of interning
This commit is contained in:
parent
3cedf46cdb
commit
8b57d73639
|
@ -763,7 +763,7 @@ Named tuples assign meaning to each position in a tuple and allow for more reada
|
||||||
self-documenting code. They can be used wherever regular tuples are used, and
|
self-documenting code. They can be used wherever regular tuples are used, and
|
||||||
they add the ability to access fields by name instead of position index.
|
they add the ability to access fields by name instead of position index.
|
||||||
|
|
||||||
.. function:: namedtuple(typename, field_names, *, verbose=False, rename=False, module=None)
|
.. function:: namedtuple(typename, field_names, *, rename=False, module=None)
|
||||||
|
|
||||||
Returns a new tuple subclass named *typename*. The new subclass is used to
|
Returns a new tuple subclass named *typename*. The new subclass is used to
|
||||||
create tuple-like objects that have fields accessible by attribute lookup as
|
create tuple-like objects that have fields accessible by attribute lookup as
|
||||||
|
@ -786,10 +786,6 @@ they add the ability to access fields by name instead of position index.
|
||||||
converted to ``['abc', '_1', 'ghi', '_3']``, eliminating the keyword
|
converted to ``['abc', '_1', 'ghi', '_3']``, eliminating the keyword
|
||||||
``def`` and the duplicate fieldname ``abc``.
|
``def`` and the duplicate fieldname ``abc``.
|
||||||
|
|
||||||
If *verbose* is true, the class definition is printed after it is
|
|
||||||
built. This option is outdated; instead, it is simpler to print the
|
|
||||||
:attr:`_source` attribute.
|
|
||||||
|
|
||||||
If *module* is defined, the ``__module__`` attribute of the named tuple is
|
If *module* is defined, the ``__module__`` attribute of the named tuple is
|
||||||
set to that value.
|
set to that value.
|
||||||
|
|
||||||
|
@ -806,6 +802,9 @@ they add the ability to access fields by name instead of position index.
|
||||||
.. versionchanged:: 3.6
|
.. versionchanged:: 3.6
|
||||||
Added the *module* parameter.
|
Added the *module* parameter.
|
||||||
|
|
||||||
|
.. versionchanged:: 3.7
|
||||||
|
Remove the *verbose* parameter and the :attr:`_source` attribute.
|
||||||
|
|
||||||
.. doctest::
|
.. doctest::
|
||||||
:options: +NORMALIZE_WHITESPACE
|
:options: +NORMALIZE_WHITESPACE
|
||||||
|
|
||||||
|
@ -878,15 +877,6 @@ field names, the method and attribute names start with an underscore.
|
||||||
>>> for partnum, record in inventory.items():
|
>>> for partnum, record in inventory.items():
|
||||||
... inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())
|
... inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())
|
||||||
|
|
||||||
.. attribute:: somenamedtuple._source
|
|
||||||
|
|
||||||
A string with the pure Python source code used to create the named
|
|
||||||
tuple class. The source makes the named tuple self-documenting.
|
|
||||||
It can be printed, executed using :func:`exec`, or saved to a file
|
|
||||||
and imported.
|
|
||||||
|
|
||||||
.. versionadded:: 3.3
|
|
||||||
|
|
||||||
.. attribute:: somenamedtuple._fields
|
.. attribute:: somenamedtuple._fields
|
||||||
|
|
||||||
Tuple of strings listing the field names. Useful for introspection
|
Tuple of strings listing the field names. Useful for introspection
|
||||||
|
|
|
@ -435,6 +435,12 @@ API and Feature Removals
|
||||||
Python 3.1, and has now been removed. Use the :func:`~os.path.splitdrive`
|
Python 3.1, and has now been removed. Use the :func:`~os.path.splitdrive`
|
||||||
function instead.
|
function instead.
|
||||||
|
|
||||||
|
* :func:`collections.namedtuple` no longer supports the *verbose* parameter
|
||||||
|
or ``_source`` attribute which showed the generated source code for the
|
||||||
|
named tuple class. This was part of an optimization designed to speed-up
|
||||||
|
class creation. (Contributed by Jelle Zijlstra with further improvements
|
||||||
|
by INADA Naoki, Serhiy Storchaka, and Raymond Hettinger in :issue:`28638`.)
|
||||||
|
|
||||||
* Functions :func:`bool`, :func:`float`, :func:`list` and :func:`tuple` no
|
* Functions :func:`bool`, :func:`float`, :func:`list` and :func:`tuple` no
|
||||||
longer take keyword arguments. The first argument of :func:`int` can now
|
longer take keyword arguments. The first argument of :func:`int` can now
|
||||||
be passed only as positional argument.
|
be passed only as positional argument.
|
||||||
|
|
|
@ -301,59 +301,9 @@ except ImportError:
|
||||||
### namedtuple
|
### namedtuple
|
||||||
################################################################################
|
################################################################################
|
||||||
|
|
||||||
_class_template = """\
|
_nt_itemgetters = {}
|
||||||
from builtins import property as _property, tuple as _tuple
|
|
||||||
from operator import itemgetter as _itemgetter
|
|
||||||
from collections import OrderedDict
|
|
||||||
|
|
||||||
class {typename}(tuple):
|
def namedtuple(typename, field_names, *, rename=False, module=None):
|
||||||
'{typename}({arg_list})'
|
|
||||||
|
|
||||||
__slots__ = ()
|
|
||||||
|
|
||||||
_fields = {field_names!r}
|
|
||||||
|
|
||||||
def __new__(_cls, {arg_list}):
|
|
||||||
'Create new instance of {typename}({arg_list})'
|
|
||||||
return _tuple.__new__(_cls, ({arg_list}))
|
|
||||||
|
|
||||||
@classmethod
|
|
||||||
def _make(cls, iterable, new=tuple.__new__, len=len):
|
|
||||||
'Make a new {typename} object from a sequence or iterable'
|
|
||||||
result = new(cls, iterable)
|
|
||||||
if len(result) != {num_fields:d}:
|
|
||||||
raise TypeError('Expected {num_fields:d} arguments, got %d' % len(result))
|
|
||||||
return result
|
|
||||||
|
|
||||||
def _replace(_self, **kwds):
|
|
||||||
'Return a new {typename} object replacing specified fields with new values'
|
|
||||||
result = _self._make(map(kwds.pop, {field_names!r}, _self))
|
|
||||||
if kwds:
|
|
||||||
raise ValueError('Got unexpected field names: %r' % list(kwds))
|
|
||||||
return result
|
|
||||||
|
|
||||||
def __repr__(self):
|
|
||||||
'Return a nicely formatted representation string'
|
|
||||||
return self.__class__.__name__ + '({repr_fmt})' % self
|
|
||||||
|
|
||||||
def _asdict(self):
|
|
||||||
'Return a new OrderedDict which maps field names to their values.'
|
|
||||||
return OrderedDict(zip(self._fields, self))
|
|
||||||
|
|
||||||
def __getnewargs__(self):
|
|
||||||
'Return self as a plain tuple. Used by copy and pickle.'
|
|
||||||
return tuple(self)
|
|
||||||
|
|
||||||
{field_defs}
|
|
||||||
"""
|
|
||||||
|
|
||||||
_repr_template = '{name}=%r'
|
|
||||||
|
|
||||||
_field_template = '''\
|
|
||||||
{name} = _property(_itemgetter({index:d}), doc='Alias for field number {index:d}')
|
|
||||||
'''
|
|
||||||
|
|
||||||
def namedtuple(typename, field_names, *, verbose=False, rename=False, module=None):
|
|
||||||
"""Returns a new subclass of tuple with named fields.
|
"""Returns a new subclass of tuple with named fields.
|
||||||
|
|
||||||
>>> Point = namedtuple('Point', ['x', 'y'])
|
>>> Point = namedtuple('Point', ['x', 'y'])
|
||||||
|
@ -390,46 +340,104 @@ def namedtuple(typename, field_names, *, verbose=False, rename=False, module=Non
|
||||||
or _iskeyword(name)
|
or _iskeyword(name)
|
||||||
or name.startswith('_')
|
or name.startswith('_')
|
||||||
or name in seen):
|
or name in seen):
|
||||||
field_names[index] = '_%d' % index
|
field_names[index] = f'_{index}'
|
||||||
seen.add(name)
|
seen.add(name)
|
||||||
for name in [typename] + field_names:
|
for name in [typename] + field_names:
|
||||||
if type(name) is not str:
|
if type(name) is not str:
|
||||||
raise TypeError('Type names and field names must be strings')
|
raise TypeError('Type names and field names must be strings')
|
||||||
if not name.isidentifier():
|
if not name.isidentifier():
|
||||||
raise ValueError('Type names and field names must be valid '
|
raise ValueError('Type names and field names must be valid '
|
||||||
'identifiers: %r' % name)
|
f'identifiers: {name!r}')
|
||||||
if _iskeyword(name):
|
if _iskeyword(name):
|
||||||
raise ValueError('Type names and field names cannot be a '
|
raise ValueError('Type names and field names cannot be a '
|
||||||
'keyword: %r' % name)
|
f'keyword: {name!r}')
|
||||||
seen = set()
|
seen = set()
|
||||||
for name in field_names:
|
for name in field_names:
|
||||||
if name.startswith('_') and not rename:
|
if name.startswith('_') and not rename:
|
||||||
raise ValueError('Field names cannot start with an underscore: '
|
raise ValueError('Field names cannot start with an underscore: '
|
||||||
'%r' % name)
|
f'{name!r}')
|
||||||
if name in seen:
|
if name in seen:
|
||||||
raise ValueError('Encountered duplicate field name: %r' % name)
|
raise ValueError(f'Encountered duplicate field name: {name!r}')
|
||||||
seen.add(name)
|
seen.add(name)
|
||||||
|
|
||||||
# Fill-in the class template
|
# Variables used in the methods and docstrings
|
||||||
class_definition = _class_template.format(
|
field_names = tuple(map(_sys.intern, field_names))
|
||||||
typename = typename,
|
num_fields = len(field_names)
|
||||||
field_names = tuple(field_names),
|
arg_list = repr(field_names).replace("'", "")[1:-1]
|
||||||
num_fields = len(field_names),
|
repr_fmt = '(' + ', '.join(f'{name}=%r' for name in field_names) + ')'
|
||||||
arg_list = repr(tuple(field_names)).replace("'", "")[1:-1],
|
tuple_new = tuple.__new__
|
||||||
repr_fmt = ', '.join(_repr_template.format(name=name)
|
_len = len
|
||||||
for name in field_names),
|
|
||||||
field_defs = '\n'.join(_field_template.format(index=index, name=name)
|
|
||||||
for index, name in enumerate(field_names))
|
|
||||||
)
|
|
||||||
|
|
||||||
# Execute the template string in a temporary namespace and support
|
# Create all the named tuple methods to be added to the class namespace
|
||||||
# tracing utilities by setting a value for frame.f_globals['__name__']
|
|
||||||
namespace = dict(__name__='namedtuple_%s' % typename)
|
s = f'def __new__(_cls, {arg_list}): return _tuple_new(_cls, ({arg_list}))'
|
||||||
exec(class_definition, namespace)
|
namespace = {'_tuple_new': tuple_new, '__name__': f'namedtuple_{typename}'}
|
||||||
result = namespace[typename]
|
# Note: exec() has the side-effect of interning the typename and field names
|
||||||
result._source = class_definition
|
exec(s, namespace)
|
||||||
if verbose:
|
__new__ = namespace['__new__']
|
||||||
print(result._source)
|
__new__.__doc__ = f'Create new instance of {typename}({arg_list})'
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def _make(cls, iterable):
|
||||||
|
result = tuple_new(cls, iterable)
|
||||||
|
if _len(result) != num_fields:
|
||||||
|
raise TypeError(f'Expected {num_fields} arguments, got {len(result)}')
|
||||||
|
return result
|
||||||
|
|
||||||
|
_make.__func__.__doc__ = (f'Make a new {typename} object from a sequence '
|
||||||
|
'or iterable')
|
||||||
|
|
||||||
|
def _replace(_self, **kwds):
|
||||||
|
result = _self._make(map(kwds.pop, field_names, _self))
|
||||||
|
if kwds:
|
||||||
|
raise ValueError(f'Got unexpected field names: {list(kwds)!r}')
|
||||||
|
return result
|
||||||
|
|
||||||
|
_replace.__doc__ = (f'Return a new {typename} object replacing specified '
|
||||||
|
'fields with new values')
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
'Return a nicely formatted representation string'
|
||||||
|
return self.__class__.__name__ + repr_fmt % self
|
||||||
|
|
||||||
|
def _asdict(self):
|
||||||
|
'Return a new OrderedDict which maps field names to their values.'
|
||||||
|
return OrderedDict(zip(self._fields, self))
|
||||||
|
|
||||||
|
def __getnewargs__(self):
|
||||||
|
'Return self as a plain tuple. Used by copy and pickle.'
|
||||||
|
return tuple(self)
|
||||||
|
|
||||||
|
# Modify function metadata to help with introspection and debugging
|
||||||
|
|
||||||
|
for method in (__new__, _make.__func__, _replace,
|
||||||
|
__repr__, _asdict, __getnewargs__):
|
||||||
|
method.__qualname__ = f'{typename}.{method.__name__}'
|
||||||
|
|
||||||
|
# Build-up the class namespace dictionary
|
||||||
|
# and use type() to build the result class
|
||||||
|
class_namespace = {
|
||||||
|
'__doc__': f'{typename}({arg_list})',
|
||||||
|
'__slots__': (),
|
||||||
|
'_fields': field_names,
|
||||||
|
'__new__': __new__,
|
||||||
|
'_make': _make,
|
||||||
|
'_replace': _replace,
|
||||||
|
'__repr__': __repr__,
|
||||||
|
'_asdict': _asdict,
|
||||||
|
'__getnewargs__': __getnewargs__,
|
||||||
|
}
|
||||||
|
cache = _nt_itemgetters
|
||||||
|
for index, name in enumerate(field_names):
|
||||||
|
try:
|
||||||
|
itemgetter_object, doc = cache[index]
|
||||||
|
except KeyError:
|
||||||
|
itemgetter_object = _itemgetter(index)
|
||||||
|
doc = f'Alias for field number {index}'
|
||||||
|
cache[index] = itemgetter_object, doc
|
||||||
|
class_namespace[name] = property(itemgetter_object, doc=doc)
|
||||||
|
|
||||||
|
result = type(typename, (tuple,), class_namespace)
|
||||||
|
|
||||||
# For pickling to work, the __module__ variable needs to be set to the frame
|
# For pickling to work, the __module__ variable needs to be set to the frame
|
||||||
# where the named tuple is created. Bypass this step in environments where
|
# where the named tuple is created. Bypass this step in environments where
|
||||||
|
|
|
@ -194,7 +194,6 @@ class TestNamedTuple(unittest.TestCase):
|
||||||
self.assertEqual(Point.__module__, __name__)
|
self.assertEqual(Point.__module__, __name__)
|
||||||
self.assertEqual(Point.__getitem__, tuple.__getitem__)
|
self.assertEqual(Point.__getitem__, tuple.__getitem__)
|
||||||
self.assertEqual(Point._fields, ('x', 'y'))
|
self.assertEqual(Point._fields, ('x', 'y'))
|
||||||
self.assertIn('class Point(tuple)', Point._source)
|
|
||||||
|
|
||||||
self.assertRaises(ValueError, namedtuple, 'abc%', 'efg ghi') # type has non-alpha char
|
self.assertRaises(ValueError, namedtuple, 'abc%', 'efg ghi') # type has non-alpha char
|
||||||
self.assertRaises(ValueError, namedtuple, 'class', 'efg ghi') # type has keyword
|
self.assertRaises(ValueError, namedtuple, 'class', 'efg ghi') # type has keyword
|
||||||
|
@ -366,11 +365,37 @@ class TestNamedTuple(unittest.TestCase):
|
||||||
newt = t._replace(itemgetter=10, property=20, self=30, cls=40, tuple=50)
|
newt = t._replace(itemgetter=10, property=20, self=30, cls=40, tuple=50)
|
||||||
self.assertEqual(newt, (10,20,30,40,50))
|
self.assertEqual(newt, (10,20,30,40,50))
|
||||||
|
|
||||||
# Broader test of all interesting names in a template
|
# Broader test of all interesting names taken from the code, old
|
||||||
with support.captured_stdout() as template:
|
# template, and an example
|
||||||
T = namedtuple('T', 'x', verbose=True)
|
words = {'Alias', 'At', 'AttributeError', 'Build', 'Bypass', 'Create',
|
||||||
words = set(re.findall('[A-Za-z]+', template.getvalue()))
|
'Encountered', 'Expected', 'Field', 'For', 'Got', 'Helper',
|
||||||
words -= set(keyword.kwlist)
|
'IronPython', 'Jython', 'KeyError', 'Make', 'Modify', 'Note',
|
||||||
|
'OrderedDict', 'Point', 'Return', 'Returns', 'Type', 'TypeError',
|
||||||
|
'Used', 'Validate', 'ValueError', 'Variables', 'a', 'accessible', 'add',
|
||||||
|
'added', 'all', 'also', 'an', 'arg_list', 'args', 'arguments',
|
||||||
|
'automatically', 'be', 'build', 'builtins', 'but', 'by', 'cannot',
|
||||||
|
'class_namespace', 'classmethod', 'cls', 'collections', 'convert',
|
||||||
|
'copy', 'created', 'creation', 'd', 'debugging', 'defined', 'dict',
|
||||||
|
'dictionary', 'doc', 'docstring', 'docstrings', 'duplicate', 'effect',
|
||||||
|
'either', 'enumerate', 'environments', 'error', 'example', 'exec', 'f',
|
||||||
|
'f_globals', 'field', 'field_names', 'fields', 'formatted', 'frame',
|
||||||
|
'function', 'functions', 'generate', 'get', 'getter', 'got', 'greater',
|
||||||
|
'has', 'help', 'identifiers', 'index', 'indexable', 'instance',
|
||||||
|
'instantiate', 'interning', 'introspection', 'isidentifier',
|
||||||
|
'isinstance', 'itemgetter', 'iterable', 'join', 'keyword', 'keywords',
|
||||||
|
'kwds', 'len', 'like', 'list', 'map', 'maps', 'message', 'metadata',
|
||||||
|
'method', 'methods', 'module', 'module_name', 'must', 'name', 'named',
|
||||||
|
'namedtuple', 'namedtuple_', 'names', 'namespace', 'needs', 'new',
|
||||||
|
'nicely', 'num_fields', 'number', 'object', 'of', 'operator', 'option',
|
||||||
|
'p', 'particular', 'pickle', 'pickling', 'plain', 'pop', 'positional',
|
||||||
|
'property', 'r', 'regular', 'rename', 'replace', 'replacing', 'repr',
|
||||||
|
'repr_fmt', 'representation', 'result', 'reuse_itemgetter', 's', 'seen',
|
||||||
|
'self', 'sequence', 'set', 'side', 'specified', 'split', 'start',
|
||||||
|
'startswith', 'step', 'str', 'string', 'strings', 'subclass', 'sys',
|
||||||
|
'targets', 'than', 'the', 'their', 'this', 'to', 'tuple', 'tuple_new',
|
||||||
|
'type', 'typename', 'underscore', 'unexpected', 'unpack', 'up', 'use',
|
||||||
|
'used', 'user', 'valid', 'values', 'variable', 'verbose', 'where',
|
||||||
|
'which', 'work', 'x', 'y', 'z', 'zip'}
|
||||||
T = namedtuple('T', words)
|
T = namedtuple('T', words)
|
||||||
# test __new__
|
# test __new__
|
||||||
values = tuple(range(len(words)))
|
values = tuple(range(len(words)))
|
||||||
|
@ -396,30 +421,15 @@ class TestNamedTuple(unittest.TestCase):
|
||||||
self.assertEqual(t.__getnewargs__(), values)
|
self.assertEqual(t.__getnewargs__(), values)
|
||||||
|
|
||||||
def test_repr(self):
|
def test_repr(self):
|
||||||
with support.captured_stdout() as template:
|
A = namedtuple('A', 'x')
|
||||||
A = namedtuple('A', 'x', verbose=True)
|
|
||||||
self.assertEqual(repr(A(1)), 'A(x=1)')
|
self.assertEqual(repr(A(1)), 'A(x=1)')
|
||||||
# repr should show the name of the subclass
|
# repr should show the name of the subclass
|
||||||
class B(A):
|
class B(A):
|
||||||
pass
|
pass
|
||||||
self.assertEqual(repr(B(1)), 'B(x=1)')
|
self.assertEqual(repr(B(1)), 'B(x=1)')
|
||||||
|
|
||||||
def test_source(self):
|
|
||||||
# verify that _source can be run through exec()
|
|
||||||
tmp = namedtuple('NTColor', 'red green blue')
|
|
||||||
globals().pop('NTColor', None) # remove artifacts from other tests
|
|
||||||
exec(tmp._source, globals())
|
|
||||||
self.assertIn('NTColor', globals())
|
|
||||||
c = NTColor(10, 20, 30)
|
|
||||||
self.assertEqual((c.red, c.green, c.blue), (10, 20, 30))
|
|
||||||
self.assertEqual(NTColor._fields, ('red', 'green', 'blue'))
|
|
||||||
globals().pop('NTColor', None) # clean-up after this test
|
|
||||||
|
|
||||||
def test_keyword_only_arguments(self):
|
def test_keyword_only_arguments(self):
|
||||||
# See issue 25628
|
# See issue 25628
|
||||||
with support.captured_stdout() as template:
|
|
||||||
NT = namedtuple('NT', ['x', 'y'], verbose=True)
|
|
||||||
self.assertIn('class NT', NT._source)
|
|
||||||
with self.assertRaises(TypeError):
|
with self.assertRaises(TypeError):
|
||||||
NT = namedtuple('NT', ['x', 'y'], True)
|
NT = namedtuple('NT', ['x', 'y'], True)
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,9 @@
|
||||||
|
Changed the implementation strategy for collections.namedtuple() to
|
||||||
|
substantially reduce the use of exec() in favor of precomputed methods. As a
|
||||||
|
result, the *verbose* parameter and *_source* attribute are no longer
|
||||||
|
supported. The benefits include 1) having a smaller memory footprint for
|
||||||
|
applications using multiple named tuples, 2) faster creation of the named
|
||||||
|
tuple class (approx 4x to 6x depending on how it is measured), and 3) minor
|
||||||
|
speed-ups for instance creation using __new__, _make, and _replace. (The
|
||||||
|
primary patch contributor is Jelle Zijlstra with further improvements by
|
||||||
|
INADA Naoki, Serhiy Storchaka, and Raymond Hettinger.)
|
Loading…
Reference in New Issue