Tuesday, 3 October 2017

c - why does GCC __builtin_prefetch not improve performance?

I'm writing a program to analyze a graph of social network. It means the program needs a lot of random memory accesses. It seems to me prefetch should help. Here is a small piece of the code of reading values from neighbors of a vertex.



for (size_t i = 0; i < v.get_num_edges(); i++) {

unsigned int id = v.neighbors[i];
res += neigh_vals[id];
}


I transform the code above to the one as below and prefetch the values of the neighbors of a vertex.



int *neigh_vals = new int[num_vertices];

for (size_t i = 0; i < v.get_num_edges(); i += 128) {

size_t this_end = std::min(v.get_num_edges(), i + 128);
for (size_t j = i; j < this_end; j++) {
unsigned int id = v.neighbors[j];
__builtin_prefetch(&neigh_vals[id], 0, 2);
}
for (size_t j = i; j < this_end; j++) {
unsigned int id = v.neighbors[j];
res += neigh_vals[id];
}
}



In this C++ code, I didn't override any operators.



Unfortunately, the code doesn't really improve the performance. I wonder why. Apparently, hardware prefetch doesn't work in this case because the hardware can't predict the memory location.



I wonder if it's caused by GCC optimization. When I compile the code, I enable -O3. I really hope prefetch can further improve performance even when -O3 is enabled. Does -O3 optimization fuse the two loops in this case? Can -O3 enable prefetch in this case by default?



I use gcc version 4.6.3 and the program runs on Intel Xeon E5-4620.




Thanks,
Da

No comments:

Post a Comment

casting - Why wasn&#39;t Tobey Maguire in The Amazing Spider-Man? - Movies &amp; TV

In the Spider-Man franchise, Tobey Maguire is an outstanding performer as a Spider-Man and also reprised his role in the sequels Spider-Man...