\section{\module{doctest} --- Test docstrings represent reality} \declaremodule{standard}{doctest} \moduleauthor{Tim Peters}{tim_one@users.sourceforge.net} \sectionauthor{Tim Peters}{tim_one@users.sourceforge.net} \sectionauthor{Moshe Zadka}{moshez@debian.org} \modulesynopsis{A framework for verifying examples in docstrings.} The \module{doctest} module searches a module's docstrings for text that looks like an interactive Python session, then executes all such sessions to verify they still work exactly as shown. Here's a complete but small example: \begin{verbatim} """ This is module example. Example supplies one function, factorial. For example, >>> factorial(5) 120 """ def factorial(n): """Return the factorial of n, an exact integer >= 0. If the result is small enough to fit in an int, return an int. Else return a long. >>> [factorial(n) for n in range(6)] [1, 1, 2, 6, 24, 120] >>> [factorial(long(n)) for n in range(6)] [1, 1, 2, 6, 24, 120] >>> factorial(30) 265252859812191058636308480000000L >>> factorial(30L) 265252859812191058636308480000000L >>> factorial(-1) Traceback (most recent call last): ... ValueError: n must be >= 0 Factorials of floats are OK, but the float must be an exact integer: >>> factorial(30.1) Traceback (most recent call last): ... ValueError: n must be exact integer >>> factorial(30.0) 265252859812191058636308480000000L It must also not be ridiculously large: >>> factorial(1e100) Traceback (most recent call last): ... OverflowError: n too large """ \end{verbatim} % allow LaTeX to break here. \begin{verbatim} import math if not n >= 0: raise ValueError("n must be >= 0") if math.floor(n) != n: raise ValueError("n must be exact integer") if n+1 == n: # catch a value like 1e300 raise OverflowError("n too large") result = 1 factor = 2 while factor <= n: try: result *= factor except OverflowError: result *= long(factor) factor += 1 return result def _test(): import doctest, example return doctest.testmod(example) if __name__ == "__main__": _test() \end{verbatim} If you run \file{example.py} directly from the command line, doctest works its magic: \begin{verbatim} $ python example.py $ \end{verbatim} There's no output! That's normal, and it means all the examples worked. Pass \programopt{-v} to the script, and doctest prints a detailed log of what it's trying, and prints a summary at the end: \begin{verbatim} $ python example.py -v Running example.__doc__ Trying: factorial(5) Expecting: 120 ok 0 of 1 examples failed in example.__doc__ Running example.factorial.__doc__ Trying: [factorial(n) for n in range(6)] Expecting: [1, 1, 2, 6, 24, 120] ok Trying: [factorial(long(n)) for n in range(6)] Expecting: [1, 1, 2, 6, 24, 120] ok Trying: factorial(30) Expecting: 265252859812191058636308480000000L ok \end{verbatim} And so on, eventually ending with: \begin{verbatim} Trying: factorial(1e100) Expecting: Traceback (most recent call last): ... OverflowError: n too large ok 0 of 8 examples failed in example.factorial.__doc__ 2 items passed all tests: 1 tests in example 8 tests in example.factorial 9 tests in 2 items. 9 passed and 0 failed. Test passed. $ \end{verbatim} That's all you need to know to start making productive use of doctest! Jump in. The docstrings in doctest.py contain detailed information about all aspects of doctest, and we'll just cover the more important points here. \subsection{Normal Usage} In normal use, end each module \module{M} with: \begin{verbatim} def _test(): import doctest, M # replace M with your module's name return doctest.testmod(M) # ditto if __name__ == "__main__": _test() \end{verbatim} If you want to test the module as the main module, you don't need to pass M to \function{testmod()}; in this case, it will test the current module. Then running the module as a script causes the examples in the docstrings to get executed and verified: \begin{verbatim} python M.py \end{verbatim} This won't display anything unless an example fails, in which case the failing example(s) and the cause(s) of the failure(s) are printed to stdout, and the final line of output is \code{'Test failed.'}. Run it with the \programopt{-v} switch instead: \begin{verbatim} python M.py -v \end{verbatim} and a detailed report of all examples tried is printed to \code{stdout}, along with assorted summaries at the end. You can force verbose mode by passing \code{verbose=1} to \function{testmod()}, or prohibit it by passing \code{verbose=0}. In either of those cases, \code{sys.argv} is not examined by \function{testmod()}. In any case, \function{testmod()} returns a 2-tuple of ints \code{(\var{f}, \var{t})}, where \var{f} is the number of docstring examples that failed and \var{t} is the total number of docstring examples attempted. \subsection{Which Docstrings Are Examined?} See \file{docstring.py} for all the details. They're unsurprising: the module docstring, and all function, class and method docstrings are searched, with the exception of docstrings attached to objects with private names. Objects imported into the module are not searched. In addition, if \code{M.__test__} exists and "is true", it must be a dict, and each entry maps a (string) name to a function object, class object, or string. Function and class object docstrings found from \code{M.__test__} are searched even if the name is private, and strings are searched directly as if they were docstrings. In output, a key \code{K} in \code{M.__test__} appears with name \begin{verbatim} .__test__.K \end{verbatim} Any classes found are recursively searched similarly, to test docstrings in their contained methods and nested classes. While private names reached from \module{M}'s globals are skipped, all names reached from \code{M.__test__} are searched. \subsection{What's the Execution Context?} By default, each time \function{testmod()} finds a docstring to test, it uses a \emph{copy} of \module{M}'s globals, so that running tests on a module doesn't change the module's real globals, and so that one test in \module{M} can't leave behind crumbs that accidentally allow another test to work. This means examples can freely use any names defined at top-level in \module{M}, and names defined earlier in the docstring being run. You can force use of your own dict as the execution context by passing \code{globs=your_dict} to \function{testmod()} instead. Presumably this would be a copy of \code{M.__dict__} merged with the globals from other imported modules. \subsection{What About Exceptions?} No problem, as long as the only output generated by the example is the traceback itself. For example: \begin{verbatim} >>> [1, 2, 3].remove(42) Traceback (most recent call last): File "", line 1, in ? ValueError: list.remove(x): x not in list >>> \end{verbatim} Note that only the exception type and value are compared (specifically, only the last line in the traceback). The various ``File'' lines in between can be left out (unless they add significantly to the documentation value of the example). \subsection{Advanced Usage} Several module level functions are available for controlling how doctests are run. \begin{funcdesc}{debug}{module, name} Debug a single docstring containing doctests. Provide the \var{module} (or dotted name of the module) containing the docstring to be debugged and the \var{name} (within the module) of the object with the docstring to be debugged. The doctest examples are extracted (see function \function{testsource()}), and written to a temporary file. The Python debugger, \refmodule{pdb}, is then invoked on that file. \versionadded{2.3} \end{funcdesc} \begin{funcdesc}{testmod}{} This function provides the most basic interface to the doctests. It creates a local instance of class \class{Tester}, runs appropriate methods of that class, and merges the results into the global \class{Tester} instance, \code{master}. To get finer control than \function{testmod()} offers, create an instance of \class{Tester} with custom policies and run the methods of \code{master} directly. See \code{Tester.__doc__} for details. \end{funcdesc} \begin{funcdesc}{testsource}{module, name} Extract the doctest examples from a docstring. Provide the \var{module} (or dotted name of the module) containing the tests to be extracted and the \var{name} (within the module) of the object with the docstring containing the tests to be extracted. The doctest examples are returned as a string containing Python code. The expected output blocks in the examples are converted to Python comments. \versionadded{2.3} \end{funcdesc} \begin{funcdesc}{DocTestSuite}{\optional{module}} Convert doctest tests for a module to a \refmodule{unittest} \class{TestSuite}. The returned \class{TestSuite} is to be run by the unittest framework and runs each doctest in the module. If any of the doctests fail, then the synthesized unit test fails, and a \exception{DocTestTestFailure} exception is raised showing the name of the file containing the test and a (sometimes approximate) line number. The optional \var{module} argument provides the module to be tested. It can be a module object or a (possibly dotted) module name. If not specified, the module calling \function{DocTestSuite()} is used. Example using one of the many ways that the \refmodule{unittest} module can use a \class{TestSuite}: \begin{verbatim} import unittest import doctest import my_module_with_doctests suite = doctest.DocTestSuite(my_module_with_doctests) runner = unittest.TextTestRunner() runner.run(suite) \end{verbatim} \versionadded{2.3} \end{funcdesc} \subsection{How are Docstring Examples Recognized?} In most cases a copy-and-paste of an interactive console session works fine --- just make sure the leading whitespace is rigidly consistent (you can mix tabs and spaces if you're too lazy to do it right, but doctest is not in the business of guessing what you think a tab means). \begin{verbatim} >>> # comments are ignored >>> x = 12 >>> x 12 >>> if x == 13: ... print "yes" ... else: ... print "no" ... print "NO" ... print "NO!!!" ... no NO NO!!! >>> \end{verbatim} Any expected output must immediately follow the final \code{'>\code{>}>~'} or \code{'...~'} line containing the code, and the expected output (if any) extends to the next \code{'>\code{>}>~'} or all-whitespace line. The fine print: \begin{itemize} \item Expected output cannot contain an all-whitespace line, since such a line is taken to signal the end of expected output. \item Output to stdout is captured, but not output to stderr (exception tracebacks are captured via a different means). \item If you continue a line via backslashing in an interactive session, or for any other reason use a backslash, you need to double the backslash in the docstring version. This is simply because you're in a string, and so the backslash must be escaped for it to survive intact. Like: \begin{verbatim} >>> if "yes" == \\ ... "y" + \\ ... "es": ... print 'yes' yes \end{verbatim} \item The starting column doesn't matter: \begin{verbatim} >>> assert "Easy!" >>> import math >>> math.floor(1.9) 1.0 \end{verbatim} and as many leading whitespace characters are stripped from the expected output as appeared in the initial \code{'>\code{>}>~'} line that triggered it. \end{itemize} \subsection{Warnings} \begin{enumerate} \item \module{doctest} is serious about requiring exact matches in expected output. If even a single character doesn't match, the test fails. This will probably surprise you a few times, as you learn exactly what Python does and doesn't guarantee about output. For example, when printing a dict, Python doesn't guarantee that the key-value pairs will be printed in any particular order, so a test like % Hey! What happened to Monty Python examples? % Tim: ask Guido -- it's his example! \begin{verbatim} >>> foo() {"Hermione": "hippogryph", "Harry": "broomstick"} >>> \end{verbatim} is vulnerable! One workaround is to do \begin{verbatim} >>> foo() == {"Hermione": "hippogryph", "Harry": "broomstick"} 1 >>> \end{verbatim} instead. Another is to do \begin{verbatim} >>> d = foo().items() >>> d.sort() >>> d [('Harry', 'broomstick'), ('Hermione', 'hippogryph')] \end{verbatim} There are others, but you get the idea. Another bad idea is to print things that embed an object address, like \begin{verbatim} >>> id(1.0) # certain to fail some of the time 7948648 >>> \end{verbatim} Floating-point numbers are also subject to small output variations across platforms, because Python defers to the platform C library for float formatting, and C libraries vary widely in quality here. \begin{verbatim} >>> 1./7 # risky 0.14285714285714285 >>> print 1./7 # safer 0.142857142857 >>> print round(1./7, 6) # much safer 0.142857 \end{verbatim} Numbers of the form \code{I/2.**J} are safe across all platforms, and I often contrive doctest examples to produce numbers of that form: \begin{verbatim} >>> 3./4 # utterly safe 0.75 \end{verbatim} Simple fractions are also easier for people to understand, and that makes for better documentation. \item Be careful if you have code that must only execute once. If you have module-level code that must only execute once, a more foolproof definition of \function{_test()} is \begin{verbatim} def _test(): import doctest, sys doctest.testmod() \end{verbatim} \item WYSIWYG isn't always the case, starting in Python 2.3. The string form of boolean results changed from \code{'0'} and \code{'1'} to \code{'False'} and \code{'True'} in Python 2.3. This makes it clumsy to write a doctest showing boolean results that passes under multiple versions of Python. In Python 2.3, by default, and as a special case, if an expected output block consists solely of \code{'0'} and the actual output block consists solely of \code{'False'}, that's accepted as an exact match, and similarly for \code{'1'} versus \code{'True'}. This behavior can be turned off by passing the new (in 2.3) module constant \constant{DONT_ACCEPT_TRUE_FOR_1} as the value of \function{testmod()}'s new (in 2.3) optional \var{optionflags} argument. Some years after the integer spellings of booleans are history, this hack will probably be removed again. \end{enumerate} \subsection{Soapbox} The first word in doctest is "doc", and that's why the author wrote doctest: to keep documentation up to date. It so happens that doctest makes a pleasant unit testing environment, but that's not its primary purpose. Choose docstring examples with care. There's an art to this that needs to be learned --- it may not be natural at first. Examples should add genuine value to the documentation. A good example can often be worth many words. If possible, show just a few normal cases, show endcases, show interesting subtle cases, and show an example of each kind of exception that can be raised. You're probably testing for endcases and subtle cases anyway in an interactive shell: doctest wants to make it as easy as possible to capture those sessions, and will verify they continue to work as designed forever after. If done with care, the examples will be invaluable for your users, and will pay back the time it takes to collect them many times over as the years go by and "things change". I'm still amazed at how often one of my doctest examples stops working after a "harmless" change. For exhaustive testing, or testing boring cases that add no value to the docs, define a \code{__test__} dict instead. That's what it's for.