If you start naively without any library that avoids the problem then memory access is the problem. Have a look at how much effort is needed to avoid the problem, for example with blocking algorithms.
Matrix multiplication is at the heart of many machine learning breakthroughs, and it just got faster—twice. Last week, DeepMind announced it discovered a more efficient way to perform matrix ...
Mathematicians love a good puzzle. Even something as abstract as multiplying matrices (two-dimensional tables of numbers) can feel like a game when you try to find the most efficient way to do it.