mirror of https://github.com/python/cpython
69 lines
2.5 KiB
Plaintext
69 lines
2.5 KiB
Plaintext
|
stringbench is a set of performance tests comparing byte string
|
||
|
operations with unicode operations. The two string implementations
|
||
|
are loosely based on each other and sometimes the algorithm for one is
|
||
|
faster than the other.
|
||
|
|
||
|
These test set was started at the Need For Speed sprint in Reykjavik
|
||
|
to identify which string methods could be sped up quickly and to
|
||
|
identify obvious places for improvement.
|
||
|
|
||
|
Here is an example of a benchmark
|
||
|
|
||
|
|
||
|
@bench('"Andrew".startswith("A")', 'startswith single character', 1000)
|
||
|
def startswith_single(STR):
|
||
|
s1 = STR("Andrew")
|
||
|
s2 = STR("A")
|
||
|
s1_startswith = s1.startswith
|
||
|
for x in _RANGE_1000:
|
||
|
s1_startswith(s2)
|
||
|
|
||
|
The bench decorator takes three parameters. The first is a short
|
||
|
description of how the code works. In most cases this is Python code
|
||
|
snippet. It is not the code which is actually run because the real
|
||
|
code is hand-optimized to focus on the method being tested.
|
||
|
|
||
|
The second parameter is a group title. All benchmarks with the same
|
||
|
group title are listed together. This lets you compare different
|
||
|
implementations of the same algorithm, such as "t in s"
|
||
|
vs. "s.find(t)".
|
||
|
|
||
|
The last is a count. Each benchmark loops over the algorithm either
|
||
|
100 or 1000 times, depending on the algorithm performance. The output
|
||
|
time is the time per benchmark call so the reader needs a way to know
|
||
|
how to scale the performance.
|
||
|
|
||
|
These parameters become function attributes.
|
||
|
|
||
|
|
||
|
Here is an example of the output
|
||
|
|
||
|
|
||
|
========== count newlines
|
||
|
38.54 41.60 92.7 ...text.with.2000.newlines.count("\n") (*100)
|
||
|
========== early match, single character
|
||
|
1.14 1.18 96.8 ("A"*1000).find("A") (*1000)
|
||
|
0.44 0.41 105.6 "A" in "A"*1000 (*1000)
|
||
|
1.15 1.17 98.1 ("A"*1000).index("A") (*1000)
|
||
|
|
||
|
The first column is the run time in milliseconds for byte strings.
|
||
|
The second is the run time for unicode strings. The third is a
|
||
|
percentage; byte time / unicode time. It's the percentage by which
|
||
|
unicode is faster than byte strings.
|
||
|
|
||
|
The last column contains the code snippet and the repeat count for the
|
||
|
internal benchmark loop.
|
||
|
|
||
|
The times are computed with 'timeit.py' which repeats the test more
|
||
|
and more times until the total time takes over 0.2 seconds, returning
|
||
|
the best time for a single iteration.
|
||
|
|
||
|
The final line of the output is the cumulative time for byte and
|
||
|
unicode strings, and the overall performance of unicode relative to
|
||
|
bytes. For example
|
||
|
|
||
|
4079.83 5432.25 75.1 TOTAL
|
||
|
|
||
|
However, this has no meaning as it evenly weights every test.
|
||
|
|