Always release the cancel join.
Fix also another corner case: _PyFaulthandler_Fini() called after setting
running variable to zero, but before releasing the join lock.
If the thread releases the join lock before the cancel lock, the thread may
sometimes still be alive at cancel_dump_tracebacks_later() exit. So the cancel
lock may be destroyed while the thread is still alive, whereas the thread will
try to release the cancel lock, which just crash.
Another minor fix: the thread doesn't release the cancel lock if it didn't
acquire it.