What Is an Intermediate Language?
Intermediate language (intermediate code) is a syntax-oriented equivalent internal representation code of a source program that is easy to translate into a target program. Its intelligibility and ease of generating target code are between the source language and the target language. Common intermediate languages include inverse Polish representation, quaternion, ternary, and tree representation.
- 1. The intermediate language has nothing to do with the specific machine characteristics. An intermediate language can serve the target code generation of multiple different types of target machines.
- 2. Machine-independent optimization of the intermediate language can help improve the quality of the target code.
- 3. Map the source program to intermediate code representation, and then map to the target code in several stages to make the compilation algorithm clearer.
- Intermediate languages are required to be not only machine-independent, but also conducive to code generation.
Intermediate Language Reverse Polish
- Inverse Polish notation is also called postfix notation. It is the simplest form of intermediate code representation. It was used to represent arithmetic expressions long before the compiler appeared.
- The inverse Polish definition of an expression containing only simple variables (including constants) is as follows:
- each variable a is inverse Polish;
- If E 1 and E 2 are inverse Polish, then E 1 E 2 is also inverse Polish, where is any operator.
- As can be seen from the definition above, the reverse Polish notation is to write the operation object in front and the operator (suffix) at the back, for example, write a + b as ab + and a * b as ab *. So expressions expressed in this notation are also called suffixes. In general, if is a k (1) mesh operator, its result on the suffixes e 1 , e 2 , ..., e k will be expressed as e 1 e 2 ... e k .
- The characteristic of reverse Polish notation is that parentheses are no longer needed to explicitly specify the order of expression operations. For example, the expression (xy) * z will be represented as xy-z *. The decomposition of an expression is completely determined according to the sequence of the operation object and the operator, and the number of meshes of each operator. The order of the operands in the expression and its inverse Polish is exactly the same, that is, all the operands in the expression are arranged in their inverse Polish in the original order. [1]
Intermediate Language Quaternion
- Quaternion is also a more commonly used form of intermediate code.
- Its form is: (OP, ARG1, ARG2, RESULT)
- Among them: OP is the operator, ARG1 is the first operation object, ARG2 is the second operation object, and RESULT is the operation result.
- Operands and results sometimes refer to user-defined variables and sometimes to temporary variables introduced by the compiler. We can easily represent an expression or a statement as a set of quaternions. For example, the quaternion of the expression a + b is: (+, a, b, T) In the quaternion notation, it is specified that for the monocular operators (monocular minus, logical negation, type conversion, etc.) ARG2 in its quaternion is empty, and is generally represented by the symbol "/" or "-". For example, the quaternion of the expression -a is: (-, a, /, T)
Intermediate Language Ternary
- The ternary expression is similar to the quaternion, except that the part of the ternary expression that does not represent the operation result is used. Wherever the operation result is involved, the position or number of the ternary expression is used instead. .
- The ternary form is: (OP, ARG1, ARG2)
- Among them, OP is an operator, ARG1 is a first operation object, and ARG2 is a second operation object. The operation objects ARG1 and ARG2 can be variable names or ternary numbers.
- For example, the ternary of the expression a + b * c is expressed as:
- (*, b, c)
- (+, a, )
- Among them, the element in the ternary represents the result of the first ternary.
Intermediate language tree representation
- The tree representation is a replica of the ternary. In the representation of the tree, the leaves are all operation objects, that is, constants or variables, and other nodes represent operators. The tree representation of an expression is easy to implement: a tree of simple variables or constants is the variable or constant itself. If the expression
- The trees of e 1 and e 2 are T 1 and T 2 respectively , then the trees of e 1 + e 2 , e 1 * e 2 , and -e are shown in Figure 1, respectively. The tree shape of the expression a * b + c * d is expressed as In Figure 2, the inverse Polish representation of the expression ab * cd * + is obtained by traversing the binary tree in the following order.
- We can easily represent the ternary into a tree representation. Each ternary (OP, ARG1, ARG2) corresponds to a binary tree, OP is the root of the subtree, and ARG1 and ARG2 are the two subtrees of the subtree. The root of the subtree, the last ternary corresponds to the root of the tree. The figure below shows the correspondence between the ternary and the tree representation.
- Generally speaking, binocular operations correspond to binary trees, and multi-eye operations correspond to multiple fork trees. However, in order to facilitate storage space arrangement, a multi-fork tree can be represented as a two-fork tree by introducing new nodes. [2]