What is a Compiler?

In simple terms, a compiler is a program that translates "one language (usually a high-level language)" into "another language (usually a low-level language)." The main workflow of a modern compiler: source code preprocessor compiler object code linker executables

In simple terms, a compiler is a program that translates "one language (usually a high-level language)" into "another language (usually a low-level language)." The main workflow of a modern compiler: source code preprocessor compiler object code linker executables
High-level computer language is easy for people to write, read, communicate and maintain. Machine language can be directly interpreted and run by a computer. The compiler takes assembly or high-level computer language source programs as input and translates them into equivalent programs of target language machine code. The source code is generally a high-level language, such as Pascal, C, C ++, Java, Chinese programming, or assembly language, and the target is the object code of the machine language (sometimes called the machine code) Machine code).
For high-level languages such as C # and VB, the function completed by the compiler at this time is to compile the source code (SourceCode) into the byte code (ByteCode) of the general intermediate language (MSIL / CIL). In the last run, the machine code (NativeCode) that can be directly calculated by the CPU is finally converted by the conversion of the common language runtime library.
Chinese name
translater
Foreign name
Compiler
nickname
decoder
expression
Source code preprocessor compiler object code
Presenter
Grace Hope
Presentation time
Late 1950s
Applied discipline
computer
Scope of application
Computer, microcontroller, programming language

How the compiler works

Compiling [1] is from source code (usually a high-level language) to being directly
translater
Or the translation of object code (usually a low-level or machine language) executed by a virtual machine. However, there are also compilers from low-level languages to high-level languages. In this kind of compiler, the low-level language code generated by the high-level language is used to regenerate high-level language code. There are also compilers that generate high-level languages from one high-level language, or compilers (also known as cascades) that generate intermediate code that requires further processing.
A typical compiler output is an object file consisting of the name and address of an entry point, and machine code for external calls (to function calls that are not in this object file). A set of object files need not be generated by the same compiler, but the compiler used must use the same output format, can be linked together and generate an EXE that can be directly executed by the user,
So the files on our computer are all compiled files.

Compiler Type

Compile
translater
A compiler can generate object code that runs in the same environment as the computer and operating system (platform) on which the compiler itself resides. This compiler is also called a "native" compiler. In addition, the compiler can also generate object code for running on other platforms. This compiler is also called a cross compiler. Cross compilers are very useful when generating new hardware platforms. "Source-to-source compiler" refers to a compiler that uses a high-level language as input and whose output is also a high-level language. For example: automatic parallelization compilers often take a high-level language as input, transform the code in it, and annotate it with parallel code comments (such as OpenMP) or annotate with language constructs (such as the DOALL instruction of FORTRAN).

Compiler processor

The function is to complete the source program by substituting into predefined program blocks.

Compiler front end

The front end is mainly responsible for parsing the input source code.
title
The intention analyzer works together. The parser is responsible for finding out the 'words' (Token) in the source code. The semantic analyzer assembles these scattered words into meaningful expressions, statements, functions, etc. according to the predefined syntax. For example, "a = b + c;" The front-end parser sees "a, =, b, +, c;". The semantic analyzer first assembles them into the expression "b + c" according to the defined syntax , And then assemble into a "a = b + c" statement. The front end is also responsible for semantic checking, such as detecting whether the variables participating in the operation are of the same type, and simple error handling. The end result is often an abstract syntax tree (or AST), so that the backend can be further optimized and processed on this basis.

Compiler backend

The compiler backend is mainly responsible for analyzing and optimizing the intermediate code (Intermediat
e representation) and generate machine code (Code Generation).
Generally speaking, all compiler analysis, optimization, and modification can be divided into two categories: intra-procedural or inter-procedural. Obviously, the analysis between functions is more accurate and optimized, but it takes longer to complete.

Compiler code analysis

translater
translater
The object of compiler analysis is the intermediate code generated and transmitted by the front end. Modern optimizing compilers often use several levels of intermediate code to represent the program, and the high-level intermediate code (high level IR) is close to The format of the input source program is dependent on the input language (language dependent), contains more global information, and the structure of the source program; middle level IR (middle level IR) has nothing to do with the input language, and Low level IR) is similar to machine language. Different analysis, optimization occurs on the most suitable layer of intermediate code.
Common compilation analysis includes function call tree (Control tree), control flow graph (Control flow graph), and variable definition-use, use-define chain (define-use / use-define or ud / du chain) ), Variable alias analysis (alias analysis), pointer analysis (pointer analysis), data dependence analysis (data dependence analysis), etc.
Program analysis
translater
The result is a prerequisite for compiler optimization and compiler transformation. Common optimizations and deformations are: function inlining, dead code elimination, loop normalization, loop unrolling, loop body merge, and loop fission ), Array padding, and so on. The purpose of optimization and deformation is to reduce the length of code, increase the usage of memory and cache, and reduce the frequency of reading and writing to disk and accessing network data. More advanced optimizations can even turn serialized code into parallel, multi-threaded code.
The generation of machine code is the process of converting the intermediate code into machine instructions after optimization and modification. Modern compilers mainly adopt the strategy of generating assembly code instead of directly generating binary object code. Even in the code generation phase, advanced compilers still have to do a lot of analysis, optimization, and deformation. For example, how to allocate registers (register allocatioin), how to choose the appropriate machine instruction (instruction selection), how to combine several lines of code into one sentence, and so on.

Compiler working methods

The compiler first performs parsing, which means separating those strings.
Then carry out semantic analysis, which is to clarify the meaning of each grammatical unit analyzed by grammatical analysis.
The final result is an object file, also known as an obj file.
Then go through the linker to generate the final EXE file.
Sometimes it is necessary to link object files generated by multiple files to generate the final code. This process is called cross-linking.

Compiler compiler optimization

Applications are complex because they have the ability to handle multiple problems and related data sets. In fact, a complex application is like paste together with many different functions. Most of the complexity in the source files comes from handling initialization and problem setup code. Although these files usually occupy a large part of the source files and are very difficult, they basically do not cost CPU execution cycles.
Despite the above, most Makefiles have only one set of compiler options to compile all files in the project. Therefore, standard optimization methods simply increase the strength of the optimization options, generally from O 2 to O 3. In this way, a lot of effort is needed to debug, determine which files cannot be optimized, and establish special make rules for these files.
A simpler but more effective method is to run the original code through a performance analyzer to generate a list of source files that consume 85 to 95% of the CPU. Usually, these files account for only about 1% of all files. If developers immediately established their own rules for each file in the list, they would be in a more flexible and effective position. In this way, changing the optimization will only cause a small part of the file to be recompiled. Furthermore, since time is not wasted optimizing time-consuming functions, recompiling all files will be greatly accelerated. [2]

Compiler comparison

Many people divide high-level programming languages into two categories: compiled languages and literal languages. In reality, however, most of these languages can be implemented in both compiled and literal versions, and the classification actually reflects the common implementation of that language. (However, some transliteration languages are difficult to implement with compilation. For example, those that allow online code changes.)

Compiler history

20th Century 5
translater
In the 1970s, IBM's John Backus led a research group to develop the FORTRAN language and its compiler. But because people didn't know much about compilation theory at that time, the development work became complicated and difficult. At the same time, Noam Chomsky began his research on the structure of natural language. His discovery eventually made the compiler's structure extremely simple, and even with some automation. Chomsky's research has led to the classification of languages based on the ease of language grammar and the algorithms needed to identify them. Just like the Chomsky Hierarchy, it includes four levels of grammar: Type 0 grammar, Type 1 grammar, Type 2 grammar, and Type 3 grammar, and each of them is a special case of the former. Type 2 grammars (or context-free grammars) have proven to be the most useful in programming languages, and today they represent the standard way of programming language structure. The research on parsing problem (effective algorithms for context-free grammar recognition) was done in the 1960s and 1970s, and it solved this problem quite well. It is already a standard part of the compilation principle.
Finite Automation and Regular Expression are closely related to context-free grammar, and they correspond to Chomsky's type 3 grammar. Their research began at about the same time as Chomsky's, and led to the symbolic way of representing words in programming languages.
People then deepened their approach to generating valid object code. This was the original compiler, and they have been used to this day. People often call it the Optimization Technique, but because it never really gets the optimized target code, it only improves its effectiveness, so it should actually be called the Code Improvement Technique. ).
When the analysis of the problem became easier to understand, people spent a lot of effort on developing programs to study this part of the compiler's automatic construction. These programs were originally called Compiler-compiler, but more precisely Parser Generator, because they can only handle part of the compilation automatically. The most famous of these programs is Yacc (Yet Another Compiler-compiler), which was written by Steve Johnson for Unix systems in 1975. Similarly, the study of finite state automata has also developed a tool called a Scanner Generator. Lex (simultaneously with Yacc, developed by Mike Lesk for Unix systems) is one of the best.
In the late 1970s and early 1980s, a large number of projects focused on generating automation in other parts of the compiler, including code generation. These attempts have been unsuccessful, presumably because the operation is too complicated and people do not know much about it.
Recent developments in compiler design include: First, the compiler includes applications for more complex algorithms that are used to infer or simplify information in the program; this has been combined with the development of more complex programming languages. Among them is a unified algorithm for Hindley-Milner type checking for functional language compilation. Secondly, the compiler has increasingly become part of the window-based Interactive Development Environment (IDE), which includes editors, linkers, debuggers, and project management programs. There are not many such IDE standards, but the development of standard window environments has become the direction. On the other hand, despite extensive research in the field of compilation principles, the basic compiler design principles have not changed much in the last 20 years, and it is rapidly becoming the central link in computer science courses.
In the 1990s, many free compilers and compiler development tools were developed as part of the GNU project or other open source projects. These tools can be used to compile all computer programming languages. Some of these projects are considered high-quality, and anyone who is interested in modern compilation theory can easily get their free source code.
Around 1999, SGI announced the source code of their industrialized parallel optimization compiler Pro64, which was later used as a research platform by multiple compiler research groups around the world and named Open64. Open64 has a good design structure, comprehensive analysis and optimization, and is an ideal platform for advanced research on compilers.
Compiler-specific terms: [1] [1]
1.
compiler compiler
2.
on-line compiler
3.
precompiler precompiler
4.
serial compiler serial compiler
5.
system-specific compiler
6.
Information Presentation Facility Compiler
7.
Compiler Monitor System

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?