From fd9ebd4a361805607baea3e038652f207575ced8 Mon Sep 17 00:00:00 2001 From: Antoine Pitrou Date: Fri, 25 Nov 2011 16:33:53 +0100 Subject: [PATCH] Clarify concatenation behaviour of immutable strings, and remove explicit mention of the CPython optimization hack. --- Doc/faq/programming.rst | 26 ++++++++++++++++++++++++++ Doc/library/stdtypes.rst | 19 +++++++++++-------- 2 files changed, 37 insertions(+), 8 deletions(-) diff --git a/Doc/faq/programming.rst b/Doc/faq/programming.rst index d1a3dafce86..f157a943a43 100644 --- a/Doc/faq/programming.rst +++ b/Doc/faq/programming.rst @@ -989,6 +989,32 @@ What does 'UnicodeDecodeError' or 'UnicodeEncodeError' error mean? See the :ref:`unicode-howto`. +What is the most efficient way to concatenate many strings together? +-------------------------------------------------------------------- + +:class:`str` and :class:`bytes` objects are immutable, therefore concatenating +many strings together is inefficient as each concatenation creates a new +object. In the general case, the total runtime cost is quadratic in the +total string length. + +To accumulate many :class:`str` objects, the recommended idiom is to place +them into a list and call :meth:`str.join` at the end:: + + chunks = [] + for s in my_strings: + chunks.append(s) + result = ''.join(chunks) + +(another reasonably efficient idiom is to use :class:`io.StringIO`) + +To accumulate many :class:`bytes` objects, the recommended idiom is to extend +a :class:`bytearray` object using in-place concatenation (the ``+=`` operator):: + + result = bytearray() + for b in my_bytes_objects: + result += b + + Sequences (Tuples/Lists) ======================== diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index af1e44a941c..5b54b09ed46 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -964,15 +964,18 @@ Notes: If *k* is ``None``, it is treated like ``1``. (6) - .. impl-detail:: + Concatenating immutable strings always results in a new object. This means + that building up a string by repeated concatenation will have a quadratic + runtime cost in the total string length. To get a linear runtime cost, + you must switch to one of the alternatives below: - If *s* and *t* are both strings, some Python implementations such as - CPython can usually perform an in-place optimization for assignments of - the form ``s = s + t`` or ``s += t``. When applicable, this optimization - makes quadratic run-time much less likely. This optimization is both - version and implementation dependent. For performance sensitive code, it - is preferable to use the :meth:`str.join` method which assures consistent - linear concatenation performance across versions and implementations. + * if concatenating :class:`str` objects, you can build a list and use + :meth:`str.join` at the end; + + * if concatenating :class:`bytes` objects, you can similarly use + :meth:`bytes.join`, or you can do in-place concatenation with a + :class:`bytearray` object. :class:`bytearray` objects are mutable and + have an efficient overallocation mechanism. .. _string-methods: