What is Parallel Processing?
Parallel processing is a computing method in a computer system that can perform two or more processes simultaneously. Parallel processing can work on different aspects of the same program at the same time. The main purpose of parallel processing is to save time on solving large and complex problems. In order to use parallel processing, the program needs to be parallelized first, that is, the parts of the work are allocated to different processing processes (threads). Parallel processing cannot be realized automatically due to the related problems. In addition, parallelism does not guarantee speedup. Theoretically, the execution speed on n parallel processes may be n times the speed of execution on a single processor.
- Only some applications can take advantage of parallel processing if they:
- in
- use
- parallel
- Hardware technology in terms of hardware technology mainly from
- The development of modern computers beginning in the 1940s can be divided into two distinct development eras: the era of serial computing and the era of parallel computing. Every computing era starts with the development of architecture, followed by system software (especially compilers and operating systems), application software, and finally reaches the peak with the development of problem solving environments. The main reason for creating and using parallel computers is because parallel computers are one of the best ways to solve single processor speed bottlenecks.
- A parallel computer is composed of a set of processing units. This group of processing units jointly complete a large-scale computing task at a faster speed through communication and cooperation with each other. Therefore, the two main components of a parallel computer are the computing nodes and the communication and cooperation mechanisms between the nodes. The development of parallel computer architecture is also mainly reflected in the improvement of computing node performance and the improvement of communication technology between nodes.
- In the early 1960s, due to the advent of transistors and magnetic core memories, processing units became smaller and smaller, and memories were more compact and cheaper. The results of these technological developments led to the emergence of parallel computers. In this period, parallel computers were mostly small-scale shared storage multiprocessor systems, so-called mainframes. IBM360 is a typical representative of this period.
- In the late 1960s, the same processor began to set up multiple functional units with the same function, and pipeline technology also appeared. Compared with simply increasing the clock frequency, the application of these parallel features inside the processor greatly improves the performance of parallel computer systems. The University of Illinois and Burroughs began to implement the IlliacIV plan at this time, developing a 64-CPU SIMD host system, which involves hardware technology, architecture, I / O devices, operating systems, programming languages, and applications. Many research topics. However, when a greatly reduced 16-CPU system finally came out in 1975, the entire computer industry had changed dramatically.
- The first is the innovation of the storage system concept, and the idea of virtual storage and caching is proposed. The IBM 360/85 system and 360/91 are two models belonging to the same series. The main frequency of 360/91 is higher than 360/85, the selected memory speed is also fast, and the dynamic scheduling instruction pipeline is used; The overall performance of 360/85 is higher than 360/91, the only reason is that the former uses caching technology, while the latter does not.
- Secondly, semiconductor memory began to replace magnetic core memory. Initially, semiconductor memory was only used as a cache in some machines, and the CDC7600 was the first to fully adopt such a smaller, faster, and directly addressable semiconductor memory, and magnetic core memory has since exited the arena of history. At the same time, integrated circuits also appeared and were quickly applied to computers. These two revolutionary breakthroughs in component technology have made the IlliacIV designers' improvements in the underlying hardware and parallel architecture greatly inferior.
- After the introduction of CRAY-1 in 1976, vector computers have firmly controlled the entire high-performance computer market for 15 years. CRAY-1 has carefully designed the logic circuits used. It uses the reduced instruction set we now call RISC, and also introduces vector registers to complete vector operations. The use of this series of new technology means that the main frequency of CRAY-1 reaches 80MHz.
- As the word length of the machine increases from 4 bits, 8 bits, and 16 bits to 32 bits, the performance of the microprocessor also improves significantly. It was because of this potential of the microprocessor that Carnegie-Mellon University began to develop a 16-chip PDP11 / 40 processor with 16 cross-switches based on the popular DECPDP11 small computer at the time. C.mmp is a shared storage multiprocessor system with connected memory modules.
- Microprocessor technology has been advancing at a high speed since the 1980s. Later, a bus protocol very suitable for the SMP method appeared, and the University of California at Berkeley extended the bus protocol to propose a solution to the problem of cache consistency. Since then, C.mmp's path to shared storage multiprocessors has widened; now, this architecture has basically dominated the server and desktop workstation market.
- During the same period, parallel computers based on messaging mechanisms also began to emerge. In the mid-1980s, Caltech successfully linked 64 i8086 / i8087 processors through a hypercube interconnect structure. Since then, there have been parallel computers based on the messaging mechanism, such as Intel iPSC series, INMOS Transputer series, Intel Paragon, and Vulcan, the predecessor of IBM SP.
- From the late 1980's to the early 1990's, massively parallel computers with shared memory approach gained new development. IBM connected a large number of early RISC microprocessors through a butterfly interconnect network. People began to think about how to make the system have a certain scalability (Scalability) while achieving shared memory cache consistency. In the early 1990s, Stanford University proposed the DASH project, which achieves cache coherency of distributed shared memory by maintaining a directory structure that holds the location information of each cache block. Later, the IEEE proposed a standard for the cache coherency protocol on this basis.
- Since the 1990s, several major architectures have begun to converge. In addition to a large number of commercial microprocessors, the CM-5 of the data parallel type also allows user-level programs to pass some simple messages; CRAY T3D is a NUMA structured shared storage parallel computer, but it also provides Global synchronization mechanism, message queue mechanism, and adopted some techniques to reduce message delivery delay.
- With the development of commercial microprocessors, network equipment, and the release of parallel programming standards such as MPI / PVM, parallel computers with cluster architecture have emerged. The IBM SP2 series cluster system is a typical representative of this. In these systems, each node uses a standard commercial computer, which is connected through a high-speed network.
- More and more parallel computer systems adopt commercial microprocessors and commercial interconnected network structures. Such distributed parallel computer systems are called clusters. Almost all high-performance computer manufacturers in China produce such high-performance computers with extremely high performance and price ratio. Parallel computers have entered a new era, and the application of parallel computing has reached unprecedented breadth and depth.
- With the development of microprocessor chips, parallel computers have entered a new era. The performance of parallel computers has exceeded 20PFLOPS, and it is developing to 10 billion times. China's parallel computer development has been at the forefront of the world. Shenteng 6800 produced by Lenovo in 2003 ranked 14th in the world's TOP500 rankings in November 2003, and Shuguang 4000A manufactured by Shuguang Corporation ranked 10th in the world's TOP500 rankings in June 2004. It is the first time that China s publicly released high-performance computers have entered the top ten among the TOP500 in the world. This indicates that China has caught up with the international advanced level in the development and production of parallel computer systems, and has laid a material foundation for improving China's scientific research level. . In the latest ranking of the world's top 500 supercomputers released by the 2013 International Supercomputer Conference, the Tianhe II supercomputer system developed by the National University of Defense Technology has a peak computing speed of 549 billion times per second and a continuous computing speed of 339 billion times per second The performance of precision floating-point arithmetic tops the list.
- From the top 10 of the TOP500, the United States is still the largest owner of supercomputers. According to the world's TOP500 statistics, the United States has nearly half of the world's computing power, and has more than 50% of all computers in the TOP500. [3]