Not to be confused with Stack unwinding or Loop unswitching.
Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; cf. Duff's device.[1]
The goal of loop unwinding is to increase a program's speed by reducing or eliminating instructions that control the loop, such as pointer arithmetic and "end of loop" tests on each iteration;[2] reducing branch penalties; as well as hiding latencies, including the delay in reading data from memory.[3] To eliminate this computational overhead, loops can be re-written as a repeated sequence of similar independent statements.[4]
Loop unrolling is also part of certain formal verification techniques, in particular bounded model checking.[5]
^Tso, Ted (August 22, 2000). "Re: [PATCH] Re: Move of input drivers, some word needed from you". lkml.indiana.edu. Linux kernel mailing list. Retrieved August 22, 2014. Jim Gettys has a wonderful explanation of this effect in the X server. It turns out that with branch predictions and the relative speed of CPU vs. memory changing over the past decade, loop unrolling is pretty much pointless. In fact, by eliminating all instances of Duff's Device from the XFree86 4.0 server, the server shrunk in size by _half_ _a_ _megabyte_ (!!!), and was faster to boot, because the elimination of all that excess code meant that the X server wasn't thrashing the cache lines as much.
^Ullman, Jeffrey D.; Aho, Alfred V. (1977). Principles of compiler design. Reading, Mass: Addison-Wesley Pub. Co. pp. 471–2. ISBN 0-201-10073-8.
^Petersen, W.P., Arbenz, P. (2004). Introduction to Parallel Computing. Oxford University Press. p. 10.{{cite book}}: CS1 maint: multiple names: authors list (link)
^Nicolau, Alexandru (1985). "Loop Quantization: Unwinding for Fine-Grain Parallelism Exploitation". Dept. of Computer Science Technical Report. Ithaca, NY: Cornell University. OCLC 14638257. {{cite journal}}: Cite journal requires |journal= (help)
Loopunrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense...
with loop reversal. LoopunrollingUnrolling duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition...
to run as many of the loop iterations as possible at the same time on a SIMD system. Unrolling – duplicates the body of the loop multiple times, in order...
iterations will be divisible by the number of iterations we unroll. See the article on loopunrolling for more on solutions to this problem, but note that software...
other based on loopunrolling. This technique, used for conventional vector machines, tries to find and exploit SIMD parallelism at the loop level. It consists...
complex programs had to be physically lengthy. A program loop was accomplished by loopunrolling or by joining the end of the paper tape containing the...
performance. Loop transformations: In optimizing loops, compilers need to consider data dependencies to apply transformations like loopunrolling, fusion,...
supporting the polyhedral model Loop nest optimization Loop optimization LoopunrollingLoop tiling "The basic polytope method", tutorial by Martin Griebl containing...
parallelism is a vectorization technique based on loopunrolling and basic block vectorization. It is distinct from loop vectorization algorithms in that it can...
optimization. As another, more significant, example of compile-time loopunrolling, template metaprogramming can be used to create length-n vector classes...
This is a similar issue to other code expanding optimizations such as loopunrolling, which also reduces number of instructions processed, but can decrease...
can be used for example by optimizing compilers or JIT compilers in loopunrolling. Dispatch table a branch table by another name used for late binding...
Distiller). This is done by applying standard compiler techniques like loopunrolling, inlining and removing unused branches, resulting in code that is purely...
overhead. This is achieved thanks to the compiler using loopunrolling, where each iteration of a loop, be it imperative or using iterators, is converted into...
award in 1997. While working at Lucasfilm, he created Duff's device, a loopunrolling mechanism in C. On August 22, 2006, the United States Patent and Trademark...
closely represents the source code. This includes loop interchange, loop fusion, loopunrolling, loop distribution, data prefetch, and more. The Intel...
and function inlining, tail recursion elimination, constant folding, loopunrolling and dead code elimination (DCE) to optimize the run-time performance...
0.8 OpenJDK 1.8.0_262, 11.0.8 This release introduced a new Partial LoopUnrolling optimization for JIT compilation. Improved the G1GC-like garbage collection...