Daniel B. answered 04/12/21
PhD in Computer Science with 42 years in Computer Research
I am assuming a common CPU architecture with a cache.
The situation is different on a GPU, for example.
Take the typical example of matrix multiplication.
It would be best to have one matrix in row major and the other in column major order.
The reason is that when fetching one element, you want to get a cache line containing
subsequent elements.
I hope this should answer question 2.
I do not know how to answer question 1, given that neither is faster in general.