we need higher priority on SPI and faster device loops
this uses DMA bounce buffers for bus transfers, and falls back to CCM ram in allocations if the type is unspecified
this is based on initial work by Sid, reset here for easier merging