cpython/Objects/lnotab_notes.txt

All about co_lnotab, the line number table.

Code objects store a field named co_lnotab.  This is an array of unsigned bytes
disguised as a Python string.  It is used to map bytecode offsets to source code
line #s for tracebacks and to identify line number boundaries for line tracing.

The array is conceptually a compressed list of
    (bytecode offset increment, line number increment)
pairs.  The details are important and delicate, best illustrated by example:

    byte code offset    source code line number
        0		    1
        6		    2
       50		    7
      350                 307
      361                 308

Instead of storing these numbers literally, we compress the list by storing only
the increments from one row to the next.  Conceptually, the stored list might
look like:

    0, 1,  6, 1,  44, 5,  300, 300,  11, 1

The above doesn't really work, but it's a start. Note that an unsigned byte
can't hold negative values, or values larger than 255, and the above example
contains two such values. So we make two tweaks:

 (a) there's a deep assumption that byte code offsets and their corresponding
 line #s both increase monotonically, and
 (b) if at least one column jumps by more than 255 from one row to the next,
 more than one pair is written to the table. In case #b, there's no way to know
 from looking at the table later how many were written.  That's the delicate
 part.  A user of co_lnotab desiring to find the source line number
 corresponding to a bytecode address A should do something like this

    lineno = addr = 0
    for addr_incr, line_incr in co_lnotab:
        addr += addr_incr
        if addr > A:
            return lineno
        lineno += line_incr

(In C, this is implemented by PyCode_Addr2Line().)  In order for this to work,
when the addr field increments by more than 255, the line # increment in each
pair generated must be 0 until the remaining addr increment is < 256.  So, in
the example above, assemble_lnotab in compile.c should not (as was actually done
until 2.2) expand 300, 300 to
    255, 255, 45, 45,
but to
    255, 0, 45, 255, 0, 45.

The above is sufficient to reconstruct line numbers for tracebacks, but not for
line tracing.  Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
and maybe_call_line_trace() in ceval.c.

*** Tracing ***

To a first approximation, we want to call the tracing function when the line
number of the current instruction changes.  Re-computing the current line for
every instruction is a little slow, though, so each time we compute the line
number we save the bytecode indices where it's valid:

     *instr_lb <= frame->f_lasti < *instr_ub

is true so long as execution does not change lines.  That is, *instr_lb holds
the first bytecode index of the current line, and *instr_ub holds the first
bytecode index of the next line.  As long as the above expression is true,
maybe_call_line_trace() does not need to call PyCode_CheckLineNumber().  Note
that the same line may appear multiple times in the lnotab, either because the
bytecode jumped more than 255 indices between line number changes or because
the compiler inserted the same line twice.  Even in that case, *instr_ub holds
the first index of the next line.

However, we don't *always* want to call the line trace function when the above
test fails.

Consider this code:

1: def f(a):
2:    while a:
3:       print 1,
4:       break
5:    else:
6:       print 2,

which compiles to this:

  2           0 SETUP_LOOP              19 (to 22)
        >>    3 LOAD_FAST                0 (a)
              6 POP_JUMP_IF_FALSE       17

  3           9 LOAD_CONST               1 (1)
             12 PRINT_ITEM          

  4          13 BREAK_LOOP          
             14 JUMP_ABSOLUTE            3
        >>   17 POP_BLOCK           

  6          18 LOAD_CONST               2 (2)
             21 PRINT_ITEM          
        >>   22 LOAD_CONST               0 (None)
             25 RETURN_VALUE        

If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 17
and the co_lnotab will claim that execution has moved to line 4, which is wrong.
In this case, we could instead associate the POP_BLOCK with line 5, but that
would break jumps around loops without else clauses.

We fix this by only calling the line trace function for a forward jump if the
co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
instruction offset matches the offset given for the start of a line by the
co_lnotab.  For backward jumps, however, we always call the line trace function,
which lets a debugger stop on every evaluation of a loop guard (which usually
won't be the first opcode in a line).

Why do we set f_lineno when tracing, and only just before calling the trace
function?  Well, consider the code above when 'a' is true.  If stepping through
this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
line events on lines 2, 3, and 4, then a "return" type event -- but because the
code for the return actually falls in the range of the "line 6" opcodes, you
would be shown line 6 during this event.  This is a change from the behaviour in
2.2 and before, and I've found it confusing in practice.  By setting and using
f_lineno when tracing, one can report a line number different from that
suggested by f_lasti on this one occasion where it's desirable.
Merged revisions 72487-72488,72879 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r72487 \| jeffrey.yasskin \| 2009-05-08 17:51:06 -0400 (Fri, 08 May 2009) \| 7 lines PyCode_NewEmpty: Most uses of PyCode_New found by http://www.google.com/codesearch?q=PyCode_New are trying to build an empty code object, usually to put it in a dummy frame object. This patch adds a PyCode_NewEmpty wrapper which lets the user specify just the filename, function name, and first line number, instead of also requiring lots of code internals. ........ r72488 \| jeffrey.yasskin \| 2009-05-08 18:23:21 -0400 (Fri, 08 May 2009) \| 13 lines Issue 5954, PyFrame_GetLineNumber: Most uses of PyCode_Addr2Line (http://www.google.com/codesearch?q=PyCode_Addr2Line) are just trying to get the line number of a specified frame, but there's no way to do that directly. Forcing people to go through the code object makes them know more about the guts of the interpreter than they should need. The remaining uses of PyCode_Addr2Line seem to be getting the line from a traceback (for example, http://www.google.com/codesearch/p?hl=en#u_9_nDrchrw/pygame-1.7.1release/src/base.c&q=PyCode_Addr2Line), which is replaced by the tb_lineno field. So we may be able to deprecate PyCode_Addr2Line entirely for external use. ........ r72879 \| jeffrey.yasskin \| 2009-05-23 19:23:01 -0400 (Sat, 23 May 2009) \| 14 lines Issue #6042: lnotab-based tracing is very complicated and isn't documented very well. There were at least 3 comment blocks purporting to document co_lnotab, and none did a very good job. This patch unifies them into Objects/lnotab_notes.txt which tries to completely capture the current state of affairs. I also discovered that we've attached 2 layers of patches to the basic tracing scheme. The first layer avoids jumping to instructions that don't start a line, to avoid problems in if statements and while loops. The second layer discovered that jumps backward do need to trace at instructions that don't start a line, so it added extra lnotab entries for 'while' and 'for' loops, and added a special case for backward jumps within the same line. I replaced these patches by just treating forward and backward jumps differently. ........ 2009-07-21 01:30:03 -03:00			`All about co_lnotab, the line number table.`

			`Code objects store a field named co_lnotab. This is an array of unsigned bytes`
			`disguised as a Python string. It is used to map bytecode offsets to source code`
			`line #s for tracebacks and to identify line number boundaries for line tracing.`

			`The array is conceptually a compressed list of`
			`(bytecode offset increment, line number increment)`
			`pairs. The details are important and delicate, best illustrated by example:`

			`byte code offset source code line number`
			`0 1`
			`6 2`
			`50 7`
			`350 307`
			`361 308`

			`Instead of storing these numbers literally, we compress the list by storing only`
			`the increments from one row to the next. Conceptually, the stored list might`
			`look like:`

			`0, 1, 6, 1, 44, 5, 300, 300, 11, 1`

			`The above doesn't really work, but it's a start. Note that an unsigned byte`
			`can't hold negative values, or values larger than 255, and the above example`
			`contains two such values. So we make two tweaks:`

			`(a) there's a deep assumption that byte code offsets and their corresponding`
			`line #s both increase monotonically, and`
			`(b) if at least one column jumps by more than 255 from one row to the next,`
			`more than one pair is written to the table. In case #b, there's no way to know`
			`from looking at the table later how many were written. That's the delicate`
			`part. A user of co_lnotab desiring to find the source line number`
			`corresponding to a bytecode address A should do something like this`

			`lineno = addr = 0`
			`for addr_incr, line_incr in co_lnotab:`
			`addr += addr_incr`
			`if addr > A:`
			`return lineno`
			`lineno += line_incr`

			`(In C, this is implemented by PyCode_Addr2Line().) In order for this to work,`
			`when the addr field increments by more than 255, the line # increment in each`
			`pair generated must be 0 until the remaining addr increment is < 256. So, in`
			`the example above, assemble_lnotab in compile.c should not (as was actually done`
			`until 2.2) expand 300, 300 to`
			`255, 255, 45, 45,`
			`but to`
			`255, 0, 45, 255, 0, 45.`

			`The above is sufficient to reconstruct line numbers for tracebacks, but not for`
			`line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c`
			`and maybe_call_line_trace() in ceval.c.`

			`* Tracing *`

			`To a first approximation, we want to call the tracing function when the line`
			`number of the current instruction changes. Re-computing the current line for`
			`every instruction is a little slow, though, so each time we compute the line`
			`number we save the bytecode indices where it's valid:`

			`instr_lb <= frame->f_lasti < instr_ub`

			`is true so long as execution does not change lines. That is, *instr_lb holds`
			`the first bytecode index of the current line, and *instr_ub holds the first`
			`bytecode index of the next line. As long as the above expression is true,`
			`maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note`
			`that the same line may appear multiple times in the lnotab, either because the`
			`bytecode jumped more than 255 indices between line number changes or because`
			`the compiler inserted the same line twice. Even in that case, *instr_ub holds`
			`the first index of the next line.`

			`However, we don't always want to call the line trace function when the above`
			`test fails.`

			`Consider this code:`

			`1: def f(a):`
			`2: while a:`
			`3: print 1,`
			`4: break`
			`5: else:`
			`6: print 2,`

			`which compiles to this:`

			`2 0 SETUP_LOOP 19 (to 22)`
			`>> 3 LOAD_FAST 0 (a)`
			`6 POP_JUMP_IF_FALSE 17`

			`3 9 LOAD_CONST 1 (1)`
			`12 PRINT_ITEM`

			`4 13 BREAK_LOOP`
			`14 JUMP_ABSOLUTE 3`
			`>> 17 POP_BLOCK`

			`6 18 LOAD_CONST 2 (2)`
			`21 PRINT_ITEM`
			`>> 22 LOAD_CONST 0 (None)`
			`25 RETURN_VALUE`

			`If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 17`
			`and the co_lnotab will claim that execution has moved to line 4, which is wrong.`
			`In this case, we could instead associate the POP_BLOCK with line 5, but that`
			`would break jumps around loops without else clauses.`

			`We fix this by only calling the line trace function for a forward jump if the`
			`co_lnotab indicates we have jumped to the start of a line, i.e. if the current`
			`instruction offset matches the offset given for the start of a line by the`
			`co_lnotab. For backward jumps, however, we always call the line trace function,`
			`which lets a debugger stop on every evaluation of a loop guard (which usually`
			`won't be the first opcode in a line).`

			`Why do we set f_lineno when tracing, and only just before calling the trace`
			`function? Well, consider the code above when 'a' is true. If stepping through`
			`this with 'n' in pdb, you would stop at line 1 with a "call" type event, then`
			`line events on lines 2, 3, and 4, then a "return" type event -- but because the`
			`code for the return actually falls in the range of the "line 6" opcodes, you`
			`would be shown line 6 during this event. This is a change from the behaviour in`
			`2.2 and before, and I've found it confusing in practice. By setting and using`
			`f_lineno when tracing, one can report a line number different from that`
			`suggested by f_lasti on this one occasion where it's desirable.`