bpo-28638: Optimize namedtuple() creation time by minimizing use of exec() (#3454)

* Working draft without _source

* Re-use itemgetter() instances

* Speed-up calls to __new__() with a pre-bound tuple.__new__()

* Add note regarding string interning

* Remove unnecessary create function wrappers

* Minor sync-ups with PR-2736.  Mostly formatting and f-strings

* Bring-in qualname/__module fix-ups from PR-2736

* Formally remove the verbose flag and _source attribute

* Restore a test of potentially problematic field names

* Restore kwonly_args test but without the verbose option

* Adopt Inada's idea to reuse the docstrings for the itemgetters

* Neaten-up a bit

* Add news blurb

* Serhiy pointed-out the need for interning

* Jelle noticed as missing f on an f-string

* Add whatsnew entry for feature removal

* Accede to request for dict literals instead keyword arguments

* Leave the method.__module__ attribute pointing the actual location of the code

* Improve variable names and add a micro-optimization for an non-public helper function

* Simplify by in-lining reuse_itemgetter()

* Arrange steps in more logical order

* Save docstring in local cache instead of interning
This commit is contained in:
Raymond Hettinger 2017-09-10 10:23:36 -07:00 committed by GitHub
parent 3cedf46cdb
commit 8b57d73639
5 changed files with 135 additions and 112 deletions

View File

@ -763,7 +763,7 @@ Named tuples assign meaning to each position in a tuple and allow for more reada
self-documenting code. They can be used wherever regular tuples are used, and self-documenting code. They can be used wherever regular tuples are used, and
they add the ability to access fields by name instead of position index. they add the ability to access fields by name instead of position index.
.. function:: namedtuple(typename, field_names, *, verbose=False, rename=False, module=None) .. function:: namedtuple(typename, field_names, *, rename=False, module=None)
Returns a new tuple subclass named *typename*. The new subclass is used to Returns a new tuple subclass named *typename*. The new subclass is used to
create tuple-like objects that have fields accessible by attribute lookup as create tuple-like objects that have fields accessible by attribute lookup as
@ -786,10 +786,6 @@ they add the ability to access fields by name instead of position index.
converted to ``['abc', '_1', 'ghi', '_3']``, eliminating the keyword converted to ``['abc', '_1', 'ghi', '_3']``, eliminating the keyword
``def`` and the duplicate fieldname ``abc``. ``def`` and the duplicate fieldname ``abc``.
If *verbose* is true, the class definition is printed after it is
built. This option is outdated; instead, it is simpler to print the
:attr:`_source` attribute.
If *module* is defined, the ``__module__`` attribute of the named tuple is If *module* is defined, the ``__module__`` attribute of the named tuple is
set to that value. set to that value.
@ -806,6 +802,9 @@ they add the ability to access fields by name instead of position index.
.. versionchanged:: 3.6 .. versionchanged:: 3.6
Added the *module* parameter. Added the *module* parameter.
.. versionchanged:: 3.7
Remove the *verbose* parameter and the :attr:`_source` attribute.
.. doctest:: .. doctest::
:options: +NORMALIZE_WHITESPACE :options: +NORMALIZE_WHITESPACE
@ -878,15 +877,6 @@ field names, the method and attribute names start with an underscore.
>>> for partnum, record in inventory.items(): >>> for partnum, record in inventory.items():
... inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now()) ... inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())
.. attribute:: somenamedtuple._source
A string with the pure Python source code used to create the named
tuple class. The source makes the named tuple self-documenting.
It can be printed, executed using :func:`exec`, or saved to a file
and imported.
.. versionadded:: 3.3
.. attribute:: somenamedtuple._fields .. attribute:: somenamedtuple._fields
Tuple of strings listing the field names. Useful for introspection Tuple of strings listing the field names. Useful for introspection

View File

@ -435,6 +435,12 @@ API and Feature Removals
Python 3.1, and has now been removed. Use the :func:`~os.path.splitdrive` Python 3.1, and has now been removed. Use the :func:`~os.path.splitdrive`
function instead. function instead.
* :func:`collections.namedtuple` no longer supports the *verbose* parameter
or ``_source`` attribute which showed the generated source code for the
named tuple class. This was part of an optimization designed to speed-up
class creation. (Contributed by Jelle Zijlstra with further improvements
by INADA Naoki, Serhiy Storchaka, and Raymond Hettinger in :issue:`28638`.)
* Functions :func:`bool`, :func:`float`, :func:`list` and :func:`tuple` no * Functions :func:`bool`, :func:`float`, :func:`list` and :func:`tuple` no
longer take keyword arguments. The first argument of :func:`int` can now longer take keyword arguments. The first argument of :func:`int` can now
be passed only as positional argument. be passed only as positional argument.

View File

@ -301,59 +301,9 @@ except ImportError:
### namedtuple ### namedtuple
################################################################################ ################################################################################
_class_template = """\ _nt_itemgetters = {}
from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict
class {typename}(tuple): def namedtuple(typename, field_names, *, rename=False, module=None):
'{typename}({arg_list})'
__slots__ = ()
_fields = {field_names!r}
def __new__(_cls, {arg_list}):
'Create new instance of {typename}({arg_list})'
return _tuple.__new__(_cls, ({arg_list}))
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new {typename} object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != {num_fields:d}:
raise TypeError('Expected {num_fields:d} arguments, got %d' % len(result))
return result
def _replace(_self, **kwds):
'Return a new {typename} object replacing specified fields with new values'
result = _self._make(map(kwds.pop, {field_names!r}, _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % list(kwds))
return result
def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + '({repr_fmt})' % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values.'
return OrderedDict(zip(self._fields, self))
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
{field_defs}
"""
_repr_template = '{name}=%r'
_field_template = '''\
{name} = _property(_itemgetter({index:d}), doc='Alias for field number {index:d}')
'''
def namedtuple(typename, field_names, *, verbose=False, rename=False, module=None):
"""Returns a new subclass of tuple with named fields. """Returns a new subclass of tuple with named fields.
>>> Point = namedtuple('Point', ['x', 'y']) >>> Point = namedtuple('Point', ['x', 'y'])
@ -390,46 +340,104 @@ def namedtuple(typename, field_names, *, verbose=False, rename=False, module=Non
or _iskeyword(name) or _iskeyword(name)
or name.startswith('_') or name.startswith('_')
or name in seen): or name in seen):
field_names[index] = '_%d' % index field_names[index] = f'_{index}'
seen.add(name) seen.add(name)
for name in [typename] + field_names: for name in [typename] + field_names:
if type(name) is not str: if type(name) is not str:
raise TypeError('Type names and field names must be strings') raise TypeError('Type names and field names must be strings')
if not name.isidentifier(): if not name.isidentifier():
raise ValueError('Type names and field names must be valid ' raise ValueError('Type names and field names must be valid '
'identifiers: %r' % name) f'identifiers: {name!r}')
if _iskeyword(name): if _iskeyword(name):
raise ValueError('Type names and field names cannot be a ' raise ValueError('Type names and field names cannot be a '
'keyword: %r' % name) f'keyword: {name!r}')
seen = set() seen = set()
for name in field_names: for name in field_names:
if name.startswith('_') and not rename: if name.startswith('_') and not rename:
raise ValueError('Field names cannot start with an underscore: ' raise ValueError('Field names cannot start with an underscore: '
'%r' % name) f'{name!r}')
if name in seen: if name in seen:
raise ValueError('Encountered duplicate field name: %r' % name) raise ValueError(f'Encountered duplicate field name: {name!r}')
seen.add(name) seen.add(name)
# Fill-in the class template # Variables used in the methods and docstrings
class_definition = _class_template.format( field_names = tuple(map(_sys.intern, field_names))
typename = typename, num_fields = len(field_names)
field_names = tuple(field_names), arg_list = repr(field_names).replace("'", "")[1:-1]
num_fields = len(field_names), repr_fmt = '(' + ', '.join(f'{name}=%r' for name in field_names) + ')'
arg_list = repr(tuple(field_names)).replace("'", "")[1:-1], tuple_new = tuple.__new__
repr_fmt = ', '.join(_repr_template.format(name=name) _len = len
for name in field_names),
field_defs = '\n'.join(_field_template.format(index=index, name=name)
for index, name in enumerate(field_names))
)
# Execute the template string in a temporary namespace and support # Create all the named tuple methods to be added to the class namespace
# tracing utilities by setting a value for frame.f_globals['__name__']
namespace = dict(__name__='namedtuple_%s' % typename) s = f'def __new__(_cls, {arg_list}): return _tuple_new(_cls, ({arg_list}))'
exec(class_definition, namespace) namespace = {'_tuple_new': tuple_new, '__name__': f'namedtuple_{typename}'}
result = namespace[typename] # Note: exec() has the side-effect of interning the typename and field names
result._source = class_definition exec(s, namespace)
if verbose: __new__ = namespace['__new__']
print(result._source) __new__.__doc__ = f'Create new instance of {typename}({arg_list})'
@classmethod
def _make(cls, iterable):
result = tuple_new(cls, iterable)
if _len(result) != num_fields:
raise TypeError(f'Expected {num_fields} arguments, got {len(result)}')
return result
_make.__func__.__doc__ = (f'Make a new {typename} object from a sequence '
'or iterable')
def _replace(_self, **kwds):
result = _self._make(map(kwds.pop, field_names, _self))
if kwds:
raise ValueError(f'Got unexpected field names: {list(kwds)!r}')
return result
_replace.__doc__ = (f'Return a new {typename} object replacing specified '
'fields with new values')
def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + repr_fmt % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values.'
return OrderedDict(zip(self._fields, self))
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
# Modify function metadata to help with introspection and debugging
for method in (__new__, _make.__func__, _replace,
__repr__, _asdict, __getnewargs__):
method.__qualname__ = f'{typename}.{method.__name__}'
# Build-up the class namespace dictionary
# and use type() to build the result class
class_namespace = {
'__doc__': f'{typename}({arg_list})',
'__slots__': (),
'_fields': field_names,
'__new__': __new__,
'_make': _make,
'_replace': _replace,
'__repr__': __repr__,
'_asdict': _asdict,
'__getnewargs__': __getnewargs__,
}
cache = _nt_itemgetters
for index, name in enumerate(field_names):
try:
itemgetter_object, doc = cache[index]
except KeyError:
itemgetter_object = _itemgetter(index)
doc = f'Alias for field number {index}'
cache[index] = itemgetter_object, doc
class_namespace[name] = property(itemgetter_object, doc=doc)
result = type(typename, (tuple,), class_namespace)
# For pickling to work, the __module__ variable needs to be set to the frame # For pickling to work, the __module__ variable needs to be set to the frame
# where the named tuple is created. Bypass this step in environments where # where the named tuple is created. Bypass this step in environments where

View File

@ -194,7 +194,6 @@ class TestNamedTuple(unittest.TestCase):
self.assertEqual(Point.__module__, __name__) self.assertEqual(Point.__module__, __name__)
self.assertEqual(Point.__getitem__, tuple.__getitem__) self.assertEqual(Point.__getitem__, tuple.__getitem__)
self.assertEqual(Point._fields, ('x', 'y')) self.assertEqual(Point._fields, ('x', 'y'))
self.assertIn('class Point(tuple)', Point._source)
self.assertRaises(ValueError, namedtuple, 'abc%', 'efg ghi') # type has non-alpha char self.assertRaises(ValueError, namedtuple, 'abc%', 'efg ghi') # type has non-alpha char
self.assertRaises(ValueError, namedtuple, 'class', 'efg ghi') # type has keyword self.assertRaises(ValueError, namedtuple, 'class', 'efg ghi') # type has keyword
@ -366,11 +365,37 @@ class TestNamedTuple(unittest.TestCase):
newt = t._replace(itemgetter=10, property=20, self=30, cls=40, tuple=50) newt = t._replace(itemgetter=10, property=20, self=30, cls=40, tuple=50)
self.assertEqual(newt, (10,20,30,40,50)) self.assertEqual(newt, (10,20,30,40,50))
# Broader test of all interesting names in a template # Broader test of all interesting names taken from the code, old
with support.captured_stdout() as template: # template, and an example
T = namedtuple('T', 'x', verbose=True) words = {'Alias', 'At', 'AttributeError', 'Build', 'Bypass', 'Create',
words = set(re.findall('[A-Za-z]+', template.getvalue())) 'Encountered', 'Expected', 'Field', 'For', 'Got', 'Helper',
words -= set(keyword.kwlist) 'IronPython', 'Jython', 'KeyError', 'Make', 'Modify', 'Note',
'OrderedDict', 'Point', 'Return', 'Returns', 'Type', 'TypeError',
'Used', 'Validate', 'ValueError', 'Variables', 'a', 'accessible', 'add',
'added', 'all', 'also', 'an', 'arg_list', 'args', 'arguments',
'automatically', 'be', 'build', 'builtins', 'but', 'by', 'cannot',
'class_namespace', 'classmethod', 'cls', 'collections', 'convert',
'copy', 'created', 'creation', 'd', 'debugging', 'defined', 'dict',
'dictionary', 'doc', 'docstring', 'docstrings', 'duplicate', 'effect',
'either', 'enumerate', 'environments', 'error', 'example', 'exec', 'f',
'f_globals', 'field', 'field_names', 'fields', 'formatted', 'frame',
'function', 'functions', 'generate', 'get', 'getter', 'got', 'greater',
'has', 'help', 'identifiers', 'index', 'indexable', 'instance',
'instantiate', 'interning', 'introspection', 'isidentifier',
'isinstance', 'itemgetter', 'iterable', 'join', 'keyword', 'keywords',
'kwds', 'len', 'like', 'list', 'map', 'maps', 'message', 'metadata',
'method', 'methods', 'module', 'module_name', 'must', 'name', 'named',
'namedtuple', 'namedtuple_', 'names', 'namespace', 'needs', 'new',
'nicely', 'num_fields', 'number', 'object', 'of', 'operator', 'option',
'p', 'particular', 'pickle', 'pickling', 'plain', 'pop', 'positional',
'property', 'r', 'regular', 'rename', 'replace', 'replacing', 'repr',
'repr_fmt', 'representation', 'result', 'reuse_itemgetter', 's', 'seen',
'self', 'sequence', 'set', 'side', 'specified', 'split', 'start',
'startswith', 'step', 'str', 'string', 'strings', 'subclass', 'sys',
'targets', 'than', 'the', 'their', 'this', 'to', 'tuple', 'tuple_new',
'type', 'typename', 'underscore', 'unexpected', 'unpack', 'up', 'use',
'used', 'user', 'valid', 'values', 'variable', 'verbose', 'where',
'which', 'work', 'x', 'y', 'z', 'zip'}
T = namedtuple('T', words) T = namedtuple('T', words)
# test __new__ # test __new__
values = tuple(range(len(words))) values = tuple(range(len(words)))
@ -396,30 +421,15 @@ class TestNamedTuple(unittest.TestCase):
self.assertEqual(t.__getnewargs__(), values) self.assertEqual(t.__getnewargs__(), values)
def test_repr(self): def test_repr(self):
with support.captured_stdout() as template: A = namedtuple('A', 'x')
A = namedtuple('A', 'x', verbose=True)
self.assertEqual(repr(A(1)), 'A(x=1)') self.assertEqual(repr(A(1)), 'A(x=1)')
# repr should show the name of the subclass # repr should show the name of the subclass
class B(A): class B(A):
pass pass
self.assertEqual(repr(B(1)), 'B(x=1)') self.assertEqual(repr(B(1)), 'B(x=1)')
def test_source(self):
# verify that _source can be run through exec()
tmp = namedtuple('NTColor', 'red green blue')
globals().pop('NTColor', None) # remove artifacts from other tests
exec(tmp._source, globals())
self.assertIn('NTColor', globals())
c = NTColor(10, 20, 30)
self.assertEqual((c.red, c.green, c.blue), (10, 20, 30))
self.assertEqual(NTColor._fields, ('red', 'green', 'blue'))
globals().pop('NTColor', None) # clean-up after this test
def test_keyword_only_arguments(self): def test_keyword_only_arguments(self):
# See issue 25628 # See issue 25628
with support.captured_stdout() as template:
NT = namedtuple('NT', ['x', 'y'], verbose=True)
self.assertIn('class NT', NT._source)
with self.assertRaises(TypeError): with self.assertRaises(TypeError):
NT = namedtuple('NT', ['x', 'y'], True) NT = namedtuple('NT', ['x', 'y'], True)

View File

@ -0,0 +1,9 @@
Changed the implementation strategy for collections.namedtuple() to
substantially reduce the use of exec() in favor of precomputed methods. As a
result, the *verbose* parameter and *_source* attribute are no longer
supported. The benefits include 1) having a smaller memory footprint for
applications using multiple named tuples, 2) faster creation of the named
tuple class (approx 4x to 6x depending on how it is measured), and 3) minor
speed-ups for instance creation using __new__, _make, and _replace. (The
primary patch contributor is Jelle Zijlstra with further improvements by
INADA Naoki, Serhiy Storchaka, and Raymond Hettinger.)