this zeros-watchdog was caused by a SPI DMA error on STM32F405:
https://discuss.ardupilot.org/t/crash-with-4-2-0-beta-and-4-3-0-daily-bdshot/83297
we had incorrectly left these internal errors enabled when asserts
were not enabled. That led to a osalSysHalt()
without these we get an spi_fail internal error, caught by the
SPIDevice code
this fixes the flash re-init problem when flash storage fills on
H7. It was caused by rejecting writes where one or more of the 32 byte
chunks was not all 0xff but was equal to the current data. That
happens when writing to the sector header in AP_FlashStorage
it also moves the interrupt disable inside the loop to allow for
other interrupts to run between blocks
this halves the number of flash writes needed, and makes flash storage
twice as space efficient on H7
On H7 we need to write 32 bytes at a time to flash, which corresponds
to 30 bytes of data in AP_FlashStorage. By using a 16 byte storage
line we don't waste as much space
we have now shown that interrupts being enabled during flash
operations can cause the infamous "68ms" bug, or watchdog when using a
32 bit timer on boards using flash for storage
The issue is quite repeatable with a load of a very large waypoint
file (over 500 waypoints) using "wp ftpload" in MAVProxy. This puts a
huge load on flash storage.
Our current working theory is that while doing flash writes for
storage on dual-bank we block access to only one bank, so if another
thread uses a timeout function with a short timeout while the flash
write is happening and chVTDoTickI calls code which crosses the flash
bank boundary then it can cause chVTDoTickI to violate the assumption
that no more than CH_CFG_ST_DELTA ticks pass while it is calculating
the value to set in the system timer. In that case we get a delay of a
full timer wrap, which is 68ms on boards with 16 bit timer and 70
minutes on boards with 32 bit timer
this prevents bad calculated timeouts in DShot. The timeout would
sometimes come out as 0xFFFFFFFF, which led to an assert and could
block the thread
This fix is meant to be minimilistic to allow it to be merged easily
into 4.2. A better fix would fix all the uint32_t wrap handling in
DShot
make sure debug will compile
take into account active channels when configuring bdshot
add channel mask debug output
correct set bdshot telemetry position at startup
make sure all channels in a bdshot group are pulled high to prevent spurious pulses