I was going through loops and found a significant difference in accessing loops.
I can't understand what is the thing that causes such difference in both cases?
First Example:
Execution Time; 8 seconds
for (int kk = 0; kk < 1000; kk++)
{
sum = 0;
for (int i = 0; i < 1024; i++)
for (int j = 0; j < 1024; j++)
{
sum += matrix[i][j];
}
}
Second Example:
Execution Time: 23 seconds
for (int kk = 0; kk < 1000; kk++)
{
sum = 0;
for (int i = 0; i < 1024; i++)
for (int j = 0; j < 1024; j++)
{
sum += matrix[j][i];
}
}
What causes so much execution time difference just exchanging
matrix[i][j]
to
matrix[j][i]
?
Answer
It's an issue of memory cache.
matrix[i][j]
has better cache hits than matrix[j][i]
, since matrix[i][j]
has more continuous memory accessing chances.
For example, when we access matrix[i][0]
, the cache may load a continuous segment of memory containing matrix[i][0]
, thus, accessing matrix[i][1]
, matrix[i][2]
, ..., will benefit from caching speed, since matrix[i][1]
, matrix[i][2]
, ... are near to matrix[i][0]
.
However, when we access matrix[j][0]
, it is far from matrix[j - 1][0]
and may not been cached, and can not benefit from caching speed. Especially, a matrix is normally stored as a continuous big segment of memory, and the cacher may predicate the behavior of memory accessing and always cache the memory.
That's why matrix[i][j]
is faster. This is typical in CPU cache based performance optimizing.
No comments:
Post a Comment