372 lines
14 KiB
Plaintext
372 lines
14 KiB
Plaintext
________________________________________________________________________
|
|
|
|
PYBENCH - A Python Benchmark Suite
|
|
________________________________________________________________________
|
|
|
|
Extendable suite of of low-level benchmarks for measuring
|
|
the performance of the Python implementation
|
|
(interpreter, compiler or VM).
|
|
|
|
pybench is a collection of tests that provides a standardized way to
|
|
measure the performance of Python implementations. It takes a very
|
|
close look at different aspects of Python programs and let's you
|
|
decide which factors are more important to you than others, rather
|
|
than wrapping everything up in one number, like the other performance
|
|
tests do (e.g. pystone which is included in the Python Standard
|
|
Library).
|
|
|
|
pybench has been used in the past by several Python developers to
|
|
track down performance bottlenecks or to demonstrate the impact of
|
|
optimizations and new features in Python.
|
|
|
|
The command line interface for pybench is the file pybench.py. Run
|
|
this script with option '--help' to get a listing of the possible
|
|
options. Without options, pybench will simply execute the benchmark
|
|
and then print out a report to stdout.
|
|
|
|
|
|
Micro-Manual
|
|
------------
|
|
|
|
Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run
|
|
the benchmark suite using default settings and 'pybench.py -f <file>'
|
|
to have it store the results in a file too.
|
|
|
|
It is usually a good idea to run pybench.py multiple times to see
|
|
whether the environment, timers and benchmark run-times are suitable
|
|
for doing benchmark tests.
|
|
|
|
You can use the comparison feature of pybench.py ('pybench.py -c
|
|
<file>') to check how well the system behaves in comparison to a
|
|
reference run.
|
|
|
|
If the differences are well below 10% for each test, then you have a
|
|
system that is good for doing benchmark testings. Of you get random
|
|
differences of more than 10% or significant differences between the
|
|
values for minimum and average time, then you likely have some
|
|
background processes running which cause the readings to become
|
|
inconsistent. Examples include: web-browsers, email clients, RSS
|
|
readers, music players, backup programs, etc.
|
|
|
|
If you are only interested in a few tests of the whole suite, you can
|
|
use the filtering option, e.g. 'pybench.py -t string' will only
|
|
run/show the tests that have 'string' in their name.
|
|
|
|
This is the current output of pybench.py --help:
|
|
|
|
"""
|
|
------------------------------------------------------------------------
|
|
PYBENCH - a benchmark test suite for Python interpreters/compilers.
|
|
------------------------------------------------------------------------
|
|
|
|
Synopsis:
|
|
pybench.py [option] files...
|
|
|
|
Options and default settings:
|
|
-n arg number of rounds (10)
|
|
-f arg save benchmark to file arg ()
|
|
-c arg compare benchmark with the one in file arg ()
|
|
-s arg show benchmark in file arg, then exit ()
|
|
-w arg set warp factor to arg (10)
|
|
-t arg run only tests with names matching arg ()
|
|
-C arg set the number of calibration runs to arg (20)
|
|
-d hide noise in comparisons (0)
|
|
-v verbose output (not recommended) (0)
|
|
--with-gc enable garbage collection (0)
|
|
--with-syscheck use default sys check interval (0)
|
|
--timer arg use given timer (time.time)
|
|
-h show this help text
|
|
--help show this help text
|
|
--debug enable debugging
|
|
--copyright show copyright
|
|
--examples show examples of usage
|
|
|
|
Version:
|
|
2.1
|
|
|
|
The normal operation is to run the suite and display the
|
|
results. Use -f to save them for later reuse or comparisons.
|
|
|
|
Available timers:
|
|
|
|
time.time
|
|
time.clock
|
|
systimes.processtime
|
|
|
|
Examples:
|
|
|
|
python3.0 pybench.py -f p30.pybench
|
|
python3.1 pybench.py -f p31.pybench
|
|
python pybench.py -s p31.pybench -c p30.pybench
|
|
"""
|
|
|
|
License
|
|
-------
|
|
|
|
See LICENSE file.
|
|
|
|
|
|
Sample output
|
|
-------------
|
|
|
|
"""
|
|
-------------------------------------------------------------------------------
|
|
PYBENCH 2.1
|
|
-------------------------------------------------------------------------------
|
|
* using CPython 3.0
|
|
* disabled garbage collection
|
|
* system check interval set to maximum: 2147483647
|
|
* using timer: time.time
|
|
|
|
Calibrating tests. Please wait...
|
|
|
|
Running 10 round(s) of the suite at warp factor 10:
|
|
|
|
* Round 1 done in 6.388 seconds.
|
|
* Round 2 done in 6.485 seconds.
|
|
* Round 3 done in 6.786 seconds.
|
|
...
|
|
* Round 10 done in 6.546 seconds.
|
|
|
|
-------------------------------------------------------------------------------
|
|
Benchmark: 2006-06-12 12:09:25
|
|
-------------------------------------------------------------------------------
|
|
|
|
Rounds: 10
|
|
Warp: 10
|
|
Timer: time.time
|
|
|
|
Machine Details:
|
|
Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
|
|
Processor: x86_64
|
|
|
|
Python:
|
|
Implementation: CPython
|
|
Executable: /usr/local/bin/python
|
|
Version: 3.0
|
|
Compiler: GCC 3.3.4 (pre 3.3.5 20040809)
|
|
Bits: 64bit
|
|
Build: Oct 1 2005 15:24:35 (#1)
|
|
Unicode: UCS2
|
|
|
|
|
|
Test minimum average operation overhead
|
|
-------------------------------------------------------------------------------
|
|
BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms
|
|
BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms
|
|
CompareFloats: 109ms 110ms 0.09us 0.361ms
|
|
CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms
|
|
CompareIntegers: 137ms 138ms 0.08us 0.542ms
|
|
CompareInternedStrings: 124ms 127ms 0.08us 1.367ms
|
|
CompareLongs: 100ms 104ms 0.10us 0.316ms
|
|
CompareStrings: 111ms 115ms 0.12us 0.929ms
|
|
CompareUnicode: 108ms 128ms 0.17us 0.693ms
|
|
ConcatStrings: 142ms 155ms 0.31us 0.562ms
|
|
ConcatUnicode: 119ms 127ms 0.42us 0.384ms
|
|
CreateInstances: 123ms 128ms 1.14us 0.367ms
|
|
CreateNewInstances: 121ms 126ms 1.49us 0.335ms
|
|
CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms
|
|
CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms
|
|
DictCreation: 108ms 109ms 0.27us 0.361ms
|
|
DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms
|
|
DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms
|
|
DictWithStringKeys: 114ms 117ms 0.10us 0.905ms
|
|
ForLoops: 110ms 111ms 4.46us 0.063ms
|
|
IfThenElse: 118ms 119ms 0.09us 0.685ms
|
|
ListSlicing: 116ms 120ms 8.59us 0.103ms
|
|
NestedForLoops: 125ms 137ms 0.09us 0.019ms
|
|
NormalClassAttribute: 124ms 136ms 0.11us 0.457ms
|
|
NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms
|
|
PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms
|
|
PythonMethodCalls: 140ms 149ms 0.66us 0.141ms
|
|
Recursion: 156ms 166ms 3.32us 0.452ms
|
|
SecondImport: 112ms 118ms 1.18us 0.180ms
|
|
SecondPackageImport: 118ms 127ms 1.27us 0.180ms
|
|
SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms
|
|
SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms
|
|
SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms
|
|
SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms
|
|
SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms
|
|
SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms
|
|
SimpleListManipulation: 103ms 113ms 0.10us 0.587ms
|
|
SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms
|
|
SmallLists: 105ms 116ms 0.17us 0.366ms
|
|
SmallTuples: 108ms 128ms 0.24us 0.406ms
|
|
SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms
|
|
SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms
|
|
StringMappings: 115ms 121ms 0.48us 0.405ms
|
|
StringPredicates: 120ms 129ms 0.18us 2.064ms
|
|
StringSlicing: 111ms 127ms 0.23us 0.781ms
|
|
TryExcept: 125ms 126ms 0.06us 0.681ms
|
|
TryRaiseExcept: 133ms 137ms 2.14us 0.361ms
|
|
TupleSlicing: 117ms 120ms 0.46us 0.066ms
|
|
UnicodeMappings: 156ms 160ms 4.44us 0.429ms
|
|
UnicodePredicates: 117ms 121ms 0.22us 2.487ms
|
|
UnicodeProperties: 115ms 153ms 0.38us 2.070ms
|
|
UnicodeSlicing: 126ms 129ms 0.26us 0.689ms
|
|
-------------------------------------------------------------------------------
|
|
Totals: 6283ms 6673ms
|
|
"""
|
|
________________________________________________________________________
|
|
|
|
Writing New Tests
|
|
________________________________________________________________________
|
|
|
|
pybench tests are simple modules defining one or more pybench.Test
|
|
subclasses.
|
|
|
|
Writing a test essentially boils down to providing two methods:
|
|
.test() which runs .rounds number of .operations test operations each
|
|
and .calibrate() which does the same except that it doesn't actually
|
|
execute the operations.
|
|
|
|
|
|
Here's an example:
|
|
------------------
|
|
|
|
from pybench import Test
|
|
|
|
class IntegerCounting(Test):
|
|
|
|
# Version number of the test as float (x.yy); this is important
|
|
# for comparisons of benchmark runs - tests with unequal version
|
|
# number will not get compared.
|
|
version = 1.0
|
|
|
|
# The number of abstract operations done in each round of the
|
|
# test. An operation is the basic unit of what you want to
|
|
# measure. The benchmark will output the amount of run-time per
|
|
# operation. Note that in order to raise the measured timings
|
|
# significantly above noise level, it is often required to repeat
|
|
# sets of operations more than once per test round. The measured
|
|
# overhead per test round should be less than 1 second.
|
|
operations = 20
|
|
|
|
# Number of rounds to execute per test run. This should be
|
|
# adjusted to a figure that results in a test run-time of between
|
|
# 1-2 seconds (at warp 1).
|
|
rounds = 100000
|
|
|
|
def test(self):
|
|
|
|
""" Run the test.
|
|
|
|
The test needs to run self.rounds executing
|
|
self.operations number of operations each.
|
|
|
|
"""
|
|
# Init the test
|
|
a = 1
|
|
|
|
# Run test rounds
|
|
#
|
|
for i in range(self.rounds):
|
|
|
|
# Repeat the operations per round to raise the run-time
|
|
# per operation significantly above the noise level of the
|
|
# for-loop overhead.
|
|
|
|
# Execute 20 operations (a += 1):
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
a += 1
|
|
|
|
def calibrate(self):
|
|
|
|
""" Calibrate the test.
|
|
|
|
This method should execute everything that is needed to
|
|
setup and run the test - except for the actual operations
|
|
that you intend to measure. pybench uses this method to
|
|
measure the test implementation overhead.
|
|
|
|
"""
|
|
# Init the test
|
|
a = 1
|
|
|
|
# Run test rounds (without actually doing any operation)
|
|
for i in range(self.rounds):
|
|
|
|
# Skip the actual execution of the operations, since we
|
|
# only want to measure the test's administration overhead.
|
|
pass
|
|
|
|
Registering a new test module
|
|
-----------------------------
|
|
|
|
To register a test module with pybench, the classes need to be
|
|
imported into the pybench.Setup module. pybench will then scan all the
|
|
symbols defined in that module for subclasses of pybench.Test and
|
|
automatically add them to the benchmark suite.
|
|
|
|
|
|
Breaking Comparability
|
|
----------------------
|
|
|
|
If a change is made to any individual test that means it is no
|
|
longer strictly comparable with previous runs, the '.version' class
|
|
variable should be updated. Therefafter, comparisons with previous
|
|
versions of the test will list as "n/a" to reflect the change.
|
|
|
|
|
|
Version History
|
|
---------------
|
|
|
|
2.1: made some minor changes for compatibility with Python 3.0:
|
|
- replaced cmp with divmod and range with max in Calls.py
|
|
(cmp no longer exists in 3.0, and range is a list in
|
|
Python 2.x and an iterator in Python 3.x)
|
|
|
|
2.0: rewrote parts of pybench which resulted in more repeatable
|
|
timings:
|
|
- made timer a parameter
|
|
- changed the platform default timer to use high-resolution
|
|
timers rather than process timers (which have a much lower
|
|
resolution)
|
|
- added option to select timer
|
|
- added process time timer (using systimes.py)
|
|
- changed to use min() as timing estimator (average
|
|
is still taken as well to provide an idea of the difference)
|
|
- garbage collection is turned off per default
|
|
- sys check interval is set to the highest possible value
|
|
- calibration is now a separate step and done using
|
|
a different strategy that allows measuring the test
|
|
overhead more accurately
|
|
- modified the tests to each give a run-time of between
|
|
100-200ms using warp 10
|
|
- changed default warp factor to 10 (from 20)
|
|
- compared results with timeit.py and confirmed measurements
|
|
- bumped all test versions to 2.0
|
|
- updated platform.py to the latest version
|
|
- changed the output format a bit to make it look
|
|
nicer
|
|
- refactored the APIs somewhat
|
|
1.3+: Steve Holden added the NewInstances test and the filtering
|
|
option during the NeedForSpeed sprint; this also triggered a long
|
|
discussion on how to improve benchmark timing and finally
|
|
resulted in the release of 2.0
|
|
1.3: initial checkin into the Python SVN repository
|
|
|
|
|
|
Have fun,
|
|
--
|
|
Marc-Andre Lemburg
|
|
mal@lemburg.com
|