* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
- The `dump_stack()` method could call a `__repr__` method implemented in Python,
causing (infinite) recursion.
I rewrote it to only print out the values for some fundamental types (`int`, `str`, etc.);
for everything else it just prints `<type_name @ 0xdeadbeef>`.
- The lltrace-like feature for uops wrote to `stderr`, while the one in `ceval.c` writes to `stdout`;
I changed the uops to write to stdout as well.