Compiler Optimization (4): Induction Variables

Kong Li2022-07-06LibcarePlusCVE

0. Basic Knowledge

0.1 Loop

Definition

A loop, or a natural loop in LLVM, is a set of nodes (L) from the control-flow graph (CFG) with the following properties [1] [2]:

  • There is a single entry node (called a header), which controls all nodes in the loop.
  • There is a backedge that enters the loop header.

Terms

  • Entering block: A non-loop node that has an edge connected to the loop. If there is only one entering block and its only edge is connected to the header, it is also called the preheader. The peheader that functions as a non-loop node dominates the entire loop.
  • Latch: A loop node that has an edge to the header.
  • Backedge: An edge from the latch to the header.
  • Exiting edge: An edge from inside the loop to a node outside the loop. The start node of the edge is called an exiting block, and the end node is called an exit block.

In the figure on the right, the yellow area is a loop; the red area is not, because it has two entry nodes a and c.

0.2 Scalar Evolution (SCEV)

Definition

SCEV is an optimization pass used by the compiler to analyze variables (usually integers). It is mainly used to analyze how variables in a loop are updated and then optimize variables based on the analysis.

**Chains of Recurrences **(CR)

As shown in the following figure, the start value of the induction variable var in the loop is start, the iteration mode is ϕ, and the step is step.

Its CR is as follows:

var = {start, ϕ , step}
// ϕ∈{+,∗}
// start: starting value
// step: step in each iteration

Example:

int m = 0;
for (int i = 0; i < n; i++) {
  m = m + n;
  *res = m;
}

In the example, m = {0, +, n} is the CR.

1. Induction Variable

1.1 Definition

for (i = 0; i < 10; ++i) {
    j = 17 * i;
}

1.2 Benefits

The advantages of induction variable optimization include but are not limited to the following:

  • The original calculation method could be replaced with a simpler instruction. For example, the multiplication in the code above can be replaced with an addition at a lower cost.
j = -17;
for (i = 0; i < 10; ++i) {
    j = j + 17;
}
  • The number of induction variables is decreased to reduce the pressure on the register.
extern int sum;
int foo(int n) {
    int i, j;
    j = 5;
    for (i = 0; i < n; ++i) {
        j += 2;
        sum += j;
    }
    return sum;
}

The loop now has two induction variables: i and j. After one variable is used to express the other, the following information is displayed:

extern int sum;
int foo(int n) {
    int i;
    for (i = 0; i < n; ++i) {
        sum += 5 + 2 * (i + 1);
    }
    return sum;
}
  • Induction variable substitution makes the relationship between variables and loop indexes clear, facilitating other optimization analyses, such as dependency analysis. In the following example, c is expressed as a function of a loop index.
int c, i;
c = 10;
for (i = 0; i < 10; i++) {
    c = c + 5; // c is incremented by 5 for each loop iteration
}

Then:

int c, i;
c = 10;
for (i = 0; i < 10; i++) {
    c = 10 + 5 * (i + 1);  // c is explicitly expressed as a function of loop index
}

2. Practices

2.1 Compilation Options

CompilerOption
gccfivopt
BiSheng Compilerindvars

2.2 Optimization Case

The induction variable optimization (ivs) is in llvm\lib\Transforms\Scalar\IndVarSimplify.cpp. Let's look at the optimization process of the BiSheng Compiler through a use case. As shown in the following figure, assume that the upper func part is the code to be optimized, and the lower func part is the expected result.


Its IR case test.ll is as follows:


The compilation command is as follows:

opt test.ll -indvars -S

In the current example, the header, latch, and exiting block are the same basic block (BB), that is, bb5.


Step 1: Traverse the source of the operand of the phi node in ExitBlock of the loop based on the def-use relationship, calculate the final value, replace the operand, and then replace the phi node.

In the example, the only operand for calculating %tmp2.lcssa is %tmp2 = add nuw nsw i32 %i.01.0, 3 , and the loop where the expression is located is bb5. In this case, the CR of %tmp2 is as follows:

%tmp2 = {3,+,3}<nuw><nsw><%bb5>

The maximum number of times that the current loop can be repeated before it exits is 199999, which means %tmp2=add(3, mul(3,199999))=600000. The current substitution cost is low (the cost calculation varies according to the architecture). Replace the value in the user of the phi node. The optimization results are as follows:


Step 2: Traverse the ExitingBlock, calculate the jump condition, and delete the corresponding instruction according to the def-use relationship. In the example, %0 of br i1 %0, label %bb5, label %bb7 is false. After the jump instruction is replaced, %0 = icmp ult i32 %tmp4, 200000 does not have a user, so it is added to dead instructions. The optimization results are as follows:


Step 3: Delete all dead instructions and check whether their operands need to be deleted. In the example, %tmp4, as the operand of %0, has a user %x.03.0, so it cannot be considered as a dead instruction to be deleted. The optimization results are as follows:


References

  1. https://llvm.org/docs/LoopTerminology.html
  2. Compilers: Principles, Techniques, & Tools by Alfred V. Aho (Author), Monica S. Lam (Author), Ravi Sethi (Author)
  3. BiSheng Compiler: https://www.hikunpeng.com/developer/devkit/compiler/bisheng
  4. Original text: https://mp.weixin.qq.com/s/9CQheIx4nlPfp-xPff5PJQ

Welcome to join the Compiler SIG to communicate about compilation technologies. Scan the QR code to add the assistant WeChat to invite you.



[Disclaimer] This article only represents the author's opinions, and is irrelevant to this website. This website is neutral in terms of the statements and opinions in this article, and does not provide any express or implied warranty of accuracy, reliability, or completeness of the contents contained therein. This article is for readers' reference only, and all legal responsibilities arising therefrom are borne by the reader himself.