Applied parallelism
Arc<Mutex<T>> is the default way in Rust, atomics are another optionN threads produce pieces of work and push them into the queueM threads pull pieces of work from the queuerayon crate)https://en.wikipedia.org/wiki/Kernel_(image_processing)
Live code examples
rayonchunks and chunks_muthttps://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html
Mtile, Ntile, Ktile?https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html