for(int i; i<length; i++) performance

Hi,

I have been looking some codebases that use SIMD intrinsics and other technics to get better results at doing things. It got me thinking recently... There's a codebase that I interact often that has lots of for usages just like in the subject, just to traverse a vector/array and do things. These iterations are independent, i=n and i=n+1 don't have any particular order that matters. Is there a quick refactor I can apply to these cases that will give me better performance? Or is GCC/Clang/MSVC already doing optimizations in this case?

Thanks!
range based could be faster for some code. it depends on the code!
these compilers already do a lot -- they unroll small loops, optimize the loop variable, optimize away the n+1 (happens once not every iteration), and probably prefetch / branch predict that the loop will go again and only pay the price of being wrong once (when the loop ends).
Last edited on
Compilers have had auto-vectorization options quite a while now. Check manual whic flag enables them.

C++17 did add execution policies to Algorithms library: https://en.cppreference.com/w/cpp/algorithm
Last edited on
There's also tbb::parallel_for for Intel, where tbb is Intel's Thread Building Blocks library.
Last edited on
For MSVS, under Properties/c++/Code Generation there's Enable Enhanced Instruction Set that allows SIMD and Vector Extensions to be set if the processor supports them. Also Enable Parallel Code Generation.

Also for MSVS, make sure that under Properties/C++/Optimization for Optimization you have Favour Speed and for Favour Size Or Speed you have Favour Fast Code. Also that Intrinsic Functions are enabled.

There are also specific intrinsic functions that can be used for possible code performance improvement. See https://docs.microsoft.com/en-us/cpp/intrinsics/x64-amd64-intrinsics-list?view=msvc-160


Thanks people! Lots of good information here! It will take me a bit to play with all options :)
Parallel algorithms in the standard library or libraries like TBB incur significant overhead, so blindly replacing all your loops with parallel versions of std::for_each is not guaranteed to improve performance. Make sure to measure.
It's been shown in articles by cppstories that simply replacing std::algorithm functions with their parallel version (from C++17) can actually degrade performance - due to the overhead of constructing the parallel threads, sync etc. They are not the silver bullet to solve performance issues.
Topic archived. No new replies allowed.