What Is High-Performance Computing?

High performance computing (abbreviated HPC) refers to computing systems and environments that typically use many processors (as part of a single machine) or several computers (operated as a single computing resource) organized in a cluster. There are many types of HPC systems, ranging from large clusters of standard computers to highly specialized hardware. Most cluster-based HPC systems use high-performance network interconnects, such as those from InfiniBand or Myrinet. The basic network topology and organization can use a simple bus topology. In a high-performance environment, the mesh network system provides a short latency between the hosts, so the overall network performance and transmission rate can be improved.

High performance computing (abbreviated HPC) refers to computing systems and environments that typically use many processors (as part of a single machine) or several computers (operated as a single computing resource) organized in a cluster. There are many types of HPC systems, ranging from large clusters of standard computers to highly specialized hardware. Most cluster-based HPC systems use high-performance network interconnects, such as those from InfiniBand or Myrinet. The basic network topology and organization can use a simple bus topology. In a high-performance environment, the mesh network system provides a short latency between the hosts, so the overall network performance and transmission rate can be improved.
Chinese name
High-performance computing
Foreign name
High performance computing
Abbreviation
HPC

High-performance computing overview

Figure 1 shows a mesh HPC system. In a mesh network topology, this structure supports faster communication across hosts by reducing the physical and logical distance between network nodes.
Figure 1.HPC Mesh Network Topology
Although network topology, hardware, and processing hardware are important in HPC systems, the core functions that make the system so effective are provided by the operating system and application software.
HPC systems use specialized operating systems that are designed to look like a single computing resource. As can be seen from Figures 1 and 2, there is a control node that forms the interface between the HPC system and the client. The control node also manages the work distribution of the compute nodes.
For task execution in a typical HPC environment, there are two models: single instruction / multiple data (SIMD) and multiple instruction / multiple data (MIMD). SIMD performs the same calculation instructions and operations simultaneously across multiple processors, but for different data ranges, it allows the system to calculate the same expression using many variables simultaneously. MIMD allows the HPC system to use different variables to perform different calculations at the same time, making the entire system look more than just a computing resource without any characteristics (although it is powerful), and can perform many calculations simultaneously.
Regardless of whether SIMD or MIMD is used, the basic principles of a typical HPC are still the same: the entire HPC unit operates and behaves like a single computing resource, which expands the actual requested load to various nodes. HPC solutions are also specialized units that are specifically designed and deployed to be able to (and only as) large computing resources.

High-performance computing comparison

Overview of High Performance Computing Grid Computing

Grid is a relatively new addition to high-performance computing systems. It has its own history and has its own applications in different environments. The key elements of a grid computing system are the nodes in the grid. They are not specialized and specialized components. In a grid, various systems are often based on standard machines or operating systems, rather than on a strictly controlled environment used in most parallel computing solutions. At the top of this standard environment are applications that support grid capabilities.
A grid may consist of a series of the same dedicated hardware, multiple machines with the same infrastructure, or a completely heterogeneous environment consisting of multiple platforms and environments. Dedicated computing resources are not necessary in the grid. Many grids are created by reusing existing infrastructure components to generate new unified computing resources.
The grid can be extended without any special requirements, making further use of nodes easier than in a typical HPC environment. With HPC solutions, you can design and deploy systems based on a fixed number of nodes. Expansion of the structure requires careful planning. Expanding the grid doesn't need to consider that much, the number of nodes will increase and decrease dynamically according to your needs or according to the available resources.
Figure 2. Mesh network architecture
Although the topology and hardware allow the grid to be based on the same structure shown in Figures 1 and 2, it is also possible to support the grid using standard network connectivity components. You can even cross conventional network boundaries and merge computing resources on the WAN or the Internet, as shown in Figure 3.
As an execution model and environment, the grid is also designed to be more flexible in terms of operations and execution. Although grids can be used for computing tasks such as HPC solutions, grids can be more flexible and can use various nodes to perform different calculations, expressions, and operations. A grid is not just a computing resource without any characteristics. It can be distributed to various nodes and used until the job and operation are completed. This makes the grid more practical where the execution order of different calculations and components is less important for the continuous execution of the remaining tasks.
A good example of a grid solution that takes advantage of this variable-length flexibility and more isolated tasks is computer-composited movies and performances in special effects. The order of generation is not important here. Single-frame or larger multi-second segments can be rendered separately from each other. Although the ultimate goal is to get the movies to play in the correct order, it doesn't matter if the last five minutes are completed before the first five minutes; they can later be brought together in the correct order.
Other major differences between grids and traditional HPC solutions are: HPC solutions are designed to provide resource-specific solutions, such as powerful computing power and the ability to hold large amounts of data in memory in order to process them. A grid is a distributed computing resource, which means that the grid can share any component as needed, including memory, CPU power, and even disk space.
Because of these differences between the two systems, different programming models and development models have been developed to simplify the process.
The special features of HPC solutions provide some benefits when developing applications to use this capability. Most HPC systems present themselves as a single computing resource, so it becomes a programming responsibility, requiring a dedicated library to build an application that can be distributed across the entire resource.
Application development in the HPC environment is usually handled through a dedicated library, which greatly simplifies the process of creating an application and allocating the tasks of the application to the entire HPC system.
One of the most popular solutions is the Message Passing Interface (MPI). MPI provides a simplified way to create jobs, using message passing to exchange work requests between nodes. As part of the development process, you may know the number of processors you want to use (in this case, separate nodes, not separate CPUs). The division of labor in the HPC environment depends on the application, and obviously also depends on the size of the HPC environment. If the work assignment to be performed depends on multiple steps and calculations, then the parallel and sequential nature of the HPC environment will play an important role in the speed and flexibility of the grid.
Figure 3.HPC function diagram
Once the work is assigned, you can send a message to each node to let them perform their part of the work. The work is put into the HPC unit and sent to each node at the same time, and usually each node is expected to give the result as a response at the same time. The result from each node is returned to the host application through another message provided by MPI, and then all the messages are received by the application, and the work is completed. An example of such a structure is shown in FIG. 3.
The execution model is usually fixed and continues to the completion of a single application. For example, if a task is assigned to 256 units and there are 64 nodes in the HPC system, then 4 process iterations are needed to complete the work. Work is usually done in parallel, and all 64 nodes will remain busy until the entire application is complete. Throughout the process, the HPC system acts as a machine. Although messages have been used to distribute work across multiple compute nodes, the entire application is effectively operated as a single application.
Other HPC libraries and interfaces work similarly, depending on the application developed for use in the HPC environment. At any time, work distribution and execution can be viewed as a separate process. Although the execution of the application may be queued, once the application starts running, the components of the job will be executed immediately on all nodes of the HPC system.
To handle multiple simultaneous applications, most HPC systems use a system where different applications can use different processor / node settings. For example, a 256-node HPC system can execute two applications at the same time, if each application uses a subset of the entire computing resource.

High Performance Computing Grid Programming

The distributed (often non-dedicated) architecture of the grid requires a different model for the execution of the work. Because of this nature of the grid, it is not possible to expect various units of work to be performed simultaneously. There are many factors that affect the execution time of a job, including the time of assignment and the effective power of the resources of each grid node.
Because of the differences in the various nodes and the way the work is processed, the grid uses a system that combines the monitoring of grid nodes with the queuing system of work units. This monitoring allows the grid manager to determine the current load on each node. This information is then used when allocating work to assign work units to be allocated to nodes that have no (or a small amount of) current resource load.
Figure 4. Grid Function Diagram
Therefore, the entire grid system is based on a series of queues and distributions. By sharing the load among the nodes, the work is distributed to each node in the queue when the nodes become available, so that the grid as a whole can be used more efficiently. .
The responses and results are equally queued on the grid controller to collect them into the application's final result set when all the units of work (and their results) have been processed. An example of this is shown in Figure 4.
The grid model allows the use of various levels of resources, unit sizes, and allocation levels, not just those provided by the execution model used by HPC solutions. Most grids support the simultaneous execution of multiple work requests using various units of work for each application that will be queued and assigned. For example, you can start work on Job2 while some nodes are still completing work on Job1. In order to complete the work, both jobs use the same number of available nodes in a dynamic way.
The flexible nature of this process not only allows work to be performed in a more dynamic and adaptive manner, it also allows the grid to be used with a variety of hardware and platforms. Even if some nodes in the grid are faster or slower than others, it doesn't matter anymore; they can complete work in their (comparative) idle time, and the results will be queued. Meanwhile, faster systems may be assigned more work and complete more units of work.
This disadvantage occurs because of the need for more onerous management costs to observe and monitor individual nodes in order to be able to efficiently distribute the work among the nodes. In heterogeneous environments, you must also consider different platforms and develop applications that are compatible across supporting environments. But in grid space, web services have simplified the process and made distribution easier, without having to worry about these differences.
Before looking at the effect of web services, look at the meeting area between HPC and the grid and understand how this will affect different execution models.

High-performance computing meeting area

There are some similarities between HPC and grid environments, and in many respects there have been some convergences and divergences, and different groups have taken advantage of the advantages of both systems. Many grid environments have emerged from the expansion of HPC solutions. Based on the work in the HPC environment, many technologies used in the grid have been optimized and adopted.
Some obvious similarities are the way work is divided into smaller units and components, and the way work is distributed among worker nodes. In the HPC environment, this labor distribution is usually strictly controlled and based on your available resources. The grid uses a more flexible model that allows work to be allocated to cells of non-standard sizes, so work can be distributed between distinct arrays of grid nodes.
Although there are differences in the way work is distributed, the basic principles of allocation remain the same: first determine the work and how it is allocated, and then create units of work accordingly. For example, if you have a calculation problem, you can distribute work by creating different sets of parameters with variables that will be applied to each set of each node.
The messaging structures and systems used in HPC systems have also been developed and adapted for grid systems. Many HPC messaging libraries use shared memory structures to support allocation of work units between nodes.
In a grid, a shared memory environment does not exist. In addition, work is distributed using different messages sent over a standard network connection (usually using TCP / IP). The core of the system is no different: messages that contain working parameters are exchanged. Only the physical method of exchanging information is different.

Web Impact of High Performance Computing Web Services

Although platform-independent HPC systems are very common (such as MPI, which supports multiple platforms and architectures), HPC solutions cannot be used directly, and many uses still depend on the unification of the architecture.
The different characteristics of typical grids have led to changes in the way work is distributed. Because grid nodes may be based on different platforms and architectures, on different public and private networks, a platform-centric method of exchanging work and requests is required. This method makes it easier to distribute work without worrying about the target environment .
Web services are based on open standards and use XML to distribute and exchange information. This effect will essentially eliminate the complexity of sharing information between platforms and architectures. Instead of writing a binary application that executes across a grid, you can write a series of Web services that support different operations. These Web services are tailored for various nodes and platforms. The cost of deploying web services is also relatively low, which makes them ideal for operations in grids that do not use dedicated compute nodes.
By eliminating compatibility issues and simplifying information distribution methods, Web services make it easier to scale the grid. With HPC solutions, it is often necessary to use the same hardware-based nodes to extend the capabilities of the HPC environment. With grids, especially when using web services, the system can be extended on almost any platform.
Other issues with grids and web services are common allocation and security considerations due to closed HPC systems and internal HPC systems no longer being applied. This is especially true when using network nodes on a WAN or public network. For HPC solutions, the security of the system can be controlled through the unified characteristics of the hardware; for all machines in a location, security is easier to control.
To improve the interoperability of web services, especially in a grid environment, the OASIS team has developed many web service standards. These standards are identified by their WS prefix. The common specification contains some top-level Web services support and comprehensive protection specifications for discovering Web services and options and exchanging information (via WS-Security).
Further standards provide standardized methods for sharing resources and information (WS-Resource and WS-Resource Framework), standardized methods for reliable exchange of messages (WS-Reliable Messaging), standardized methods for event notification (WS-Notification), or even a standardized method for Web services management (WS-Distributed Management).
For security reasons, the WS-Reliable Messaging exchange can be packaged with the WS-Security standard, which defines methods and procedures for authentication, authorization, and message exchange encryption.
By combining web services standards support, security specifications, and your own custom web services components, you can build an efficient grid that uses multiple platforms and environments. Applications can then be used in a LAN environment, or securely provide computing resources over a public network as powerful as typical HPC solutions, but with extended flexibility and standard support for grid technology.

High-performance computing

Grid computing is technically a high-performance computer, but it differs from traditional HPC environments in many ways. Most traditional HPC technologies are based on fixed and dedicated hardware, and combine some specialized operating systems and environments to produce high-performance environments. In comparison, grids can use daily hardware, different platforms, and even be configured to use excess capacity in existing infrastructure.
Although there are some differences, there are many similarities between the two systems, especially when looking at the division and distribution of work across nodes. In both cases, Web services can be used to help support system operations. By using open standards and allowing support for a wider range of operating systems and environments, web services and grid technologies can make a big difference in the power and flexibility of high-performance computing solutions. [1]

High-performance computing improves performance

Various high-performance computing facilities must be selected according to the different needs of the enterprise, but all high-performance computing applications must be specially optimized, which is not the same as the traditional data center requirements. The following methods allow the HPC application platform to execute at the highest performance.

Choosing the Right Memory for High Performance Computing

There are three types of DIMM memory available: UDIMM memory, RDIMM memory, and LRDIMM memory. When dealing with larger workloads, unbuffered DIMMs (UDIMMs) are fast, inexpensive, and unstable. Registered DIMM (RDIMM) memory is stable, expandable, expensive, and has low electrical pressure on the memory controller. They are also used on many traditional servers. Load-reduced DIMMs (LRDIMMs) are an alternative to register memory. They provide high memory speeds, reduce the load on the server memory bus, and consume less power.

High-performance computing upgrade facility

A big difference between HPC system design and traditional data center infrastructure design is the choice of off-the-shelf tools or custom systems. Off-the-shelf systems can only be expanded to a small extent, limiting future growth. Customization can maintain an open design, allowing enterprises to better expand capabilities in the future. However, additional functionality is a significant price to pay for a custom system, much higher than buying an off-the-shelf system.

HPC HPC takes full advantage of HPC

HPC application design is different from traditional design. Developers need to split the information flow into parallel groups.

High-performance computing keeps the system consistent

When there are inconsistencies in the cluster, HPC administrators may see some sporadic changes in the goods, affecting application performance. Considering the potential performance, IT departments need to implement policies to confirm what applications are running in the HPC system and find ways to synchronize the configuration. These checks should be performed quarterly or no less than twice a year.

High-performance computing focuses on energy consumption

The average server overhead is 30kw per cabinet, and this number is still rising. Due to the high density, the architectural infrastructure and cooling systems in high-efficiency data become critical. [2]

High-performance computing optimization

High Performance Computing (High Performance Computing) is a branch of computer science, which mainly refers to the research and development of high performance computer technology from the aspects of architecture, parallel algorithms and software development.
With the rapid development of computer technology, the computing speed of high-performance computers is constantly increasing, and its standards are also constantly changing.
Sugon CAE High Performance Computing Platform
High-performance computing is simply to complete certain types of technical workloads on 16 or more servers. It doesn't matter whether this number needs 8 or 12 or 16 servers. It is assumed by definition that each server is running its own independent operating system, and the associated input / output infrastructure is built on the COTS system.
In short, the discussion is about Linux high-performance computing clusters.
An information center with 20,000 servers is no problem to perform molecular dynamics simulations, just like a small engineering company running computational fluid dynamics (CFD) simulations in its computer room. The only limitation to solving the workload comes from the technical level. The question we will discuss next is what can be applied directly.
Metrics
Performance (Performance), Performance per Watt (Performance / Watt), Performance per square foot (Performance / Squarefoot), Performance-to-price ratio (Performance / dollar), etc. For the mentioned 20,000 server power molecular cluster, the reason Obvious. Running such a system is often limited by the energy consumption (watts) and volume (square feet) of the server. Both factors are included in the total cost of ownership (TCO). Everyone pays much attention to achieving greater economic benefits in terms of total cost of ownership (TCO).
The scope of the topic is in terms of performance to help everyone understand the importance of performance energy consumption, performance density and total cost of ownership (TCO) in practice.
Definition of performance
Performance is defined here as a calculation rate. For example, the workload that is completed every day, the speed of floating point operations per second (FLOPs), and so on. The next thing to think about is the completion time for a given workload. The two are directly related, speed = 1 / (time / workload). Therefore, performance is measured according to the workload of the operation, and it is converted into the required speed by calculating its completion time.
Quantitative and Qualitative
From a qualitative level, this question is easy to answer, that is, faster processors, more memory, better network and disk input / output subsystems. This answer is not accurate enough when deciding whether to buy a Linu cluster.
Quantitative analysis of the performance of Linux high-performance computing clusters.
This article introduces some quantitative models and method skills, which can guide everyone's business decisions very accurately, but also very simple and practical. For example, these business decisions include:
Buy --- System Components Selection Guide to Get the Best Performance or the Most Economical Performance
Linux High Performance Computing Cluster Model
Configuration --- identify bottlenecks in systems and application software
Planning-highlighting the relevance and limitations of performance to formulate a medium-term business plan
The Linux high-performance computing cluster model includes four main types of hardware components.
(1) A compute node or server that executes a technical workload;
(2) A master node for cluster management, work control, etc .;
(3) Interconnected cables and highly popular Gigabit Ethernet (GBE);
(4) Some global storage systems are as easy to use as the NFS files output by the master node.
The measurement standard of high-performance computers is mainly based on the calculation speed (especially the floating-point operation speed). High-performance computer is a cutting-edge high-tech in the field of information. It plays a direct role in ensuring national security, promoting the advancement of national defense science and technology, and promoting the development of cutting-edge weapons.
With the rapid development of the information society, humans have higher and higher requirements for information processing capabilities. Not only do high-performance computers such as petroleum exploration, weather forecasting, aerospace defense, and scientific research, but also finance, government informationization, education, enterprises, The demand for high-performance computing in a broader field, such as online games, is growing rapidly.
A simple quantitative application model
Such a quantitative application model is very intuitive. The time it takes to complete a given job on a cluster is roughly equivalent to the time spent on separate subsystems:
e
1.Time = Node time (Tnode) + Cable time (Tfabric) + Storage time (Tstorage)
Time = Tnode + Tfabric + Tstorag
The time here refers to the completion time of the execution workload, the node time (Tnode) refers to the completion time spent on the computing node, and the cable time (Tfabric) refers to the completion of the interconnection of each node on the Internet Time, and storage time (Tstorage) refers to the completion time to access the local area network or global storage system.
Compute node completion time is roughly equivalent to the time spent on a separate subsystem:
2.Node time (Tnode) = Kernel time (Tcore) + Memory time (Tmemory)
The Tcore here refers to the completion time on the microprocessor computing node. The memory time (Tmemory) refers to the completion time to access the main memory. This model is very practical for a single CPU compute node, and can be easily extended to a general dual-socket (SMP symmetric multiprocessing) compute node. In order to make the second set of models more practical, the completion time of the subsystem must also be related to the physical configuration parameters of the compute nodes, such as the speed of the processor, the speed of the memory, and so on.
calculate node
Compute node prototype in the picture to recognize related configuration parameters. At the top of the figure are two processor sockets, which are connected to the memory control center (MCH) through the FSB-front side bus. This memory control center (MCH) has four memory channels. There is also an Infiniband HCA connected via a channel point-to-point serial (PCIe).
Performance parameter
Low-speed I / O systems like Gigabit Ethernet and Serial Interface (SATA) hard drives are connected via the South Bridge channel in the chipset. In the picture, you can see that a performance-related parameter is marked in red next to each major component. These parameters specify the characteristics of the hardware that affect performance (but not all). They are also usually directly related to the cost of the hardware. For example, processor clock frequency (fcore) has a huge impact on performance under most workload conditions. According to the principle of the supply-demand intersection semiconductor yield curve, the faster the processor speed, the higher the corresponding cost.
The size of the cache memory also affects performance, which can reduce the workload carried by the main frequency to increase its operation speed. The number of processor cores (Ncores) also affects performance and cost. The speed of the memory subsystem can be parameterized according to the dual in-line memory module frequency (fDIMM) and bus frequency (fBus). It also affects performance under workload conditions. Similarly, the speed of the interconnect fabric depends on the frequency of the channel's point-to-point serial.
Other factors, such as DIMM CAS Latency and the number of memory channels, are ignored as secondary factors for the time being.
Performance parameters used
Among the six performance parameters indicated in the figure, four parameters related to the model are retained.
First, the frequency of the point-to-point serial channel (fPCIe) is ignored, because it mainly affects the performance of the interconnect fabric speed of the cable, which is not in the range.
Next note that the dual in-line memory module frequency (fDIMM) and bus frequency (fBus) will be limited to a fixed ratio by the memory control center (MCH).
Among the dual-core systems used, these ratios are most typically 4: 5, 1: 1, 5: 4. Usually only one of them is used. The size of the cache memory is very important.
Keep this parameter in this model. The number of cores (Ncores) and core frequency (fcore) are also very important. Keep these two parameters.
High Performance Computing (HPC) Model
The basic form of this second model has existed in the field of computer architecture for many years.
A normal mode is:
(3) CPI = CPI0 + MPI * PPM
The CPI here refers to the cycle of each instruction executed by the processor under the workload state. CPI0 refers to the kernel CPI, and MPI I refers to the number of errors in each instruction of the cache memory under the workload state. Construction conventions), PPM refers to the record of the number of times each instruction in the cache memory is faulted in units of processor clock ticks. The second and third equations agree with each other. This first term refers to the processor, and the second term refers to memory.
It can be seen intuitively that if the workload of the P instruction executed under each job is multiplied by the core frequency (the unit of the processor operating cycle per second) representing the frequency of the processor, and equation (3), Equation (4):
Tnode = (CPIo * P) * (1 / fcore) + (MPI * P) * PPM * (1 / fcore)
It should be noted here (CPIo * P) is the unit of the processor's operating cycle under each work allocation. The given workload running on the microprocessor architecture is usually a constant amount. Therefore, it is named . (The processor cycle itself cannot measure time. If you multiply the frequency of the core, you can get the time measurement standard. Therefore, Tnode is on the right side of equation (4)).
(MPI * P) is the same. It is also constant for a given workload and architecture, but it mainly depends on the size of the cache memory. We named it M (MBcache). PPM refers to the cost of accessing main memory. For a given workload, it is usually a fixed number C. PPM multiplied by the ratio of memory frequency to bus frequency (fcore / fBus) is converted from bus cycles to processor cycles. So PM = C * fcore / fBus. Put in M (MBcache) and get:
(5) Tnode = * (1 / fcore) + M (MBcache) * (1 / fbus)
This example shows that the bus frequency is also a constant. Equation (5) can be simplified to equation (6):
(6) Tnode = * (1 / fcore) +
Here Tcore = * (1 / fcore), and Tmemory = (that is, the term in Equation 2. We associate these key points together).
First of all, in Model 2, both Formula 5 and Formula 6 have a solid theoretical basis, after analyzing how it is inferred from Formula 3 (it is mainly used in computer system theory). Secondly, 3 of the 4 hardware performance parameters of this model have been included. One more parameter is the number of cores (Ncores).
To explain the number of cores in an intuitive way, it is assumed that N cores are regarded as one core running on a network frequency, and it is called N * fcore. Then according to formula (6) we can roughly calculate:
(7) Tcore ~ / (N * fcore)
Tcore ~ ( / N) * (1 / fcore)
It can also be written as:
(8) N = ( / N)
The first letter Alpha of a multi-core processor may be 1 / N times that of a single-core processor.
This is almost entirely possible through mathematical inference.
Generally, we measure the performance of a computer system based on the system core and bus frequencies, as described in equation (5). But the left side of formula (5) is the time unit-this time unit refers to the completion time of a workload. In this way, the main system parameters on the right can be explained more clearly in units of time. At the same time please note that the core clock cycle core (referring to the time required for each core run cycle) is also equal to (1 / fcore). The same is true for the bus clock cycle.
(9) Tnode = N * core + M (MBcache) * Bus
The conversion of this formula also gives a model of the completion time, that is, the two basic independent variables core and Bus show a linear change. This is helpful for analyzing real system data using a simple checkerboard lookup table. [3]

Development of high-performance computing applications

Everyone has gradually agreed with this view that high-performance computers are servers with prices above 100,000 yuan. The reason why it is called a high-performance computer is that it has performance and functional advantages compared to a microcomputer and a low-end PC server. High-performance computers are also divided into high-, mid-, and low-grade, and the mid-range system market is developing fastest. From the perspective of application and market, there are two types of mid-to-high-end systems.
Dawn 2000
One is called supercomputer, which is mainly used for scientific engineering calculation and special design, such as Cray T3E; the other is called superserver, which can be used to support computing, transaction processing, database applications, network applications and services, such as IBM's SP And domestic Dawn 2000.
From a market perspective, high-performance computers are an industry with high technology, high profits, and a growing market share. The widespread application of high-performance computers in government departments, scientific research and other fields has an irreplaceable role in enhancing a country's scientific and technological competitiveness. In addition, experience in the United States and Europe has proven that companies using high-performance computers can effectively increase productivity.
The development trend of high-performance computers is mainly manifested in the aspects of networking, mainstreaming of architecture, openness and standardization, and diversification of applications. The networked trend will be the most important trend for high-performance computers. The main use of high-performance computers is as a host in a network computing environment. In the future, more and more applications are applications in the network environment. There will be billions of client devices. All important data and applications will be placed on high-performance servers. The Client / Server mode will enter the second place. Generation, the mode of server aggregation, is a development trend.
Grid (Gird) has become a new research hotspot in high-performance computing and is a very important emerging technology. The application model of the network computing environment will still be Internet / Web, but after 5 to 10 years, the information grid model will gradually become the mainstream. The United States is significantly ahead of other countries in terms of computing grids. There is a view that the current support for grid research in the United States can be compared with its support for Internet research in the 1970s, and it is expected to spread to all areas of national economic and social development in 10 years. The main difference between the grid and the Internet / Web is integration. It organizes computers, data, valuable equipment, users, software, and information distributed throughout the country into a logical whole. Industries can run their own application grids on this basis. The United States started the STAR-TAP program, trying to expand the grid to the world.
In terms of architecture, an important trend is that superservers are replacing supercomputers as mainstream architecture technologies for high-performance computing. The low-end products in the high-performance computer market will mainly be SMP (Symmetric MultiProcessor), and the mid-range products will be SMP, CC-NUMA (Cache Coherent-Non Uniform Memory Access), and clusters. High-end products will use SMP or CC-NUMA node clusters. Around 2001, there will be a hybrid structure that combines the advantages of NUMA (COMA and CC-NUMA) and the cluster architecture, called the Cluster-NUMA (C-NUMA) system. Reconfigurable, partitionable, and configurable features will become increasingly important. In addition, a new architecture called Multithreading will be used in supercomputers. Its representative is Tera's MTA system. An 8-CPU MTA has successfully run in the San Diego Supercomputer Center. It is worth noting that the high-end systems planned by all manufacturers are clusters, and some manufacturers have begun to study the C-NUMA structure.
The United States has always been the country that places the highest value on high-performance computers, invests the most, and benefits the most, and its research is also ahead of the world. The U.S. Department of Energy's Accelerated Strategic Computing ASCI Program aims to construct 100 trillion supercomputer systems, software, and algorithms to realistically simulate nuclear explosions in 2004; the White House s High-End Computing and Computations (HECC) programPetaflopsUltrascale2010200240
3100
380
209080%209015%
90%PC
(e-productivity)2000-3
200050
1997220738%34%GDP25%3%GDP2.6%1%GDP1/101/30
199950%
8848
8632000102000
200051417MicrosoftGordon BellPCRichard Stallman GNU/LinuxBoris Babayan6IntelPentium Itanium
IBMPulley BlankIBM
973 973 [4]

XPCGPU
103400 2800NVIDIA Tesla GPUGPU
GPUGPUGPUGPUCPU4GPUCPU+GPUCPUGPU1-2
GPU41GPUGPUWIN7GPUGPUNVIDIA2009GPUCPU+GPU
GPUGPU256AMD50043400 2800110
1010GPU

8631000
PCPC
SMCC-NUMAClusterIAPCRISC SMPRISC
MABS
TOP500()3264
2090
4000ATOP500
()
RISCLinux
IBMSun90%40 [5]

When it comes to Moore's Law, the first law of computer development has been leading the IT industry. However, with the development and application of multi-core technology, Moore's Law has been surpassed in some areas while facing challenges. For example, in the increasingly popular high-performance computing (HPC). So why was Moore's Law first surpassed in the field of high-performance computing? What kind of industry trend is implied in this?
First of all, from the perspective of global high-performance computing TOP500 performance development trend in recent years, which represents the global high-performance computing level and trend, whether it is the maximum performance (the world's number one system), the minimum performance (the world's number one in the world), and the average performance, The speed of its development curve is basically the same. However, compared with the development curve of Moore's Law, it is obviously in a steep growth trend. This shows that in the past two years, the development speed of high-performance computing performance and applications has exceeded Moore's Law. As anyone familiar with Moore's Law knows, Moore's Law has three interpretations. One is the number of circuits integrated on the integrated circuit chip, which doubles every 18 months; the second is that the performance of the microprocessor doubles every 18 months, and the price is halved; the third is The explanation is that the performance of a computer that can be bought for one dollar doubles every 18 months. Of the three interpretations, the most cited in the industry is the first. But when it comes to high-performance computing, I prefer to use the second or third explanation.
It stands to reason that with the continuous improvement of high-performance computing performance and the growing system, the cost of high-performance computing users will increase significantly whether they are in the initial procurement system or in later use. In the special period of the economic crisis, high-performance computing Calculating such a large TCO will result in fewer users and lower overall performance. However, the recently released TOP500 of global high-performance computing proves that the growth momentum has not diminished. In addition to the needs of the market and users, it also lies in the use of new technology by processor manufacturers, which allows users to lower costs while improving performance Enjoy higher and more computing performance. In this sense, Moore's Law is being surpassed and being surpassed, that is, in the field of high-performance computing, the user performance / input ratio is far greater than Moore's Law. Of course, this is mainly due to processor process, architecture technology, multi-core technology, energy saving technology, software optimization and rapid deployment.
For example, from the perspective of manufacturing process and number of cores, the latest TOP500 ranking of global high-performance computing shows that 45 nanometers has occupied the absolute mainstream. The multi-core has reached 2/3 of the global TOP500. From the speed of deployment, two of the 6 cores that AMD just released have entered the TOP500. And Intel's new Nehalem multi-core architecture high-performance computing system has more than 33 sets (systems based on this processor) entered the TOP500, two of which are in the TOP20. The rapid deployment brings users the latest technology and performance.
Of course, for users, multi-core is not the key, what is important is how to give full play to the performance of multi-core. This requires related platform technology and software optimization. For example, in the field of high-performance computing, the industry has heard of the "half-width board" standard. This "half-width board" standard was actually proposed by Intel a few years ago. The small half-width board increases the computing density while saving a lot of reused components. While enhancing the density of high-performance computing, it also cooperates with heat dissipation. Technical design can provide more computing power while reducing energy consumption. This leads to a new development direction, that is, the future development of high-performance computing is that energy consumption is more used to improve computing performance, rather than heat dissipation. In addition, it is SSD (Solid State Drive), which can greatly improve the reliability and I / O performance of high-performance computing systems, while reducing power consumption. Software optimization is the most important part of high-performance computing. The compiler, function library, and MPI library all help ISVs to fully utilize the computing performance of multi-core processors.
From this point of view, in the field of high-performance computing, pure processors can no longer meet the needs of the market and users. They need high-performance computing platform-level solutions and solutions. This is also the main reason why the introduction of energy efficiency in the world's high-performance computing TOP500.
Speaking of energy efficiency, I have long heard that there is a "Kissinger Rule" in the industry that is as important as Moore's Law. It's named after Pat Kissinger, Intel's chief technology officer, known in the processor industry. The main thrust of this rule is that the future development of processors will be to study how to improve processor energy efficiency and enable computer users to take full advantage of multi-tasking, security, reliability, manageability, and wireless computing. If "Moore's Law" is aimed at the pursuit of processing performance, and "Kissinger Rule" is the pursuit of energy efficiency of the processor, this rule has been verified at least in the field of high-performance computing, and what it brings is Moore's Law is surpassed, that is, users will get higher energy efficiency in a shorter period and at a lower price.

High Performance Computing Meteorology Business

Human beings have put in a great deal of effort and effort from countless humans to high-performance computers. For modern weather forecasting and meteorological research work, high-performance computers occupy an extremely important position.
Meteorological work is inseparable from high-performance computers
With the development of society and economy, the government, society and the public have put forward higher requirements for meteorological forecasts and services. In particular, some special meteorological support tasks require forecasters to provide fixed-point, regular and quantitative fine-grained meteorological forecasts and services. For modern weather forecasting, in order to ensure the effectiveness and stability of its implementation, it must be based on numerical forecasting. However, numerical models generally have the characteristics of huge calculation scale and high accuracy. It has become the mainstay of modern meteorological research.
The level of numerical weather forecasting has become an important indicator of the modernization of meteorological undertakings around the world. The National Center for Atmospheric Research, in collaboration with the University of Colorado, used the IBM Blue Gene Supercomputer to simulate ocean, weather, and climate phenomena, and studied the effects of these phenomena on issues such as agricultural production, changes in oil prices, and global warming. Japanese scientists have successfully developed a supercomputer code-named "Earth Simulator", whose main purpose is to provide accurate global weather forecasts, so that various countries and regions can better protect themselves from blizzards, cold currents, and the heat of summer.
China is a country with a vast territory. It presents multiple levels, diversity, and variability in its climate. Especially in recent years, natural disasters such as floods and droughts have been serious. Timely and accurate weather forecasting has gradually received attention. The regional meteorological market is gradually maturing, and higher-efficiency, high-performance computers have become the focus of attention. As the first brand of domestic server, Shuguang Company has always paid close attention to the demand for high-performance computers in the meteorological field.
Due to the integrated design of software and hardware, the Dawning Meteorological Special Machine directly transplants the NCA MM5 system, which is a leader in the field of mesoscale numerical weather forecasting, on the hardware platform. This system automatically and regularly performs business system forecasting at fixed points every day. The entire process from weather mapping is completed automatically without manual intervention; users can monitor the operation of the entire system at any time, which greatly saves operation time. Even without any training in computer system knowledge, users can quickly master the entire forecasting system. In addition, the system can be used as a business forecasting system and a platform for meteorological research and testing. One machine is multi-purpose, and users can set parameters and debug algorithms according to their needs. The system also provides a data saving function, which enables users to recalculate and analyze the forecasts that they were not satisfied with in the past month, to meet the needs of the meteorological department for accurate and timely forecasts.
Meteorological work is inseparable from high-performance computers, and the host computer is updated every three or four years, and the speed must be increased by an order of magnitude. In the first 10 years, we can only choose high-performance computers of foreign brands, and the high-performance computers represented by Shuguang in recent years have significantly improved the comprehensive strength of weather services. Dawning machine has achieved a very wide range of applications in China's meteorological field, which greatly promoted the improvement of China's meteorological science and technology level, and provided a strong guarantee for ordinary people's daily travel and many major national projects. From daily weather forecasting to large-scale climate research, from land to ocean, from surface hydrometeorology to space weather, Dawning high-performance computers are active. [6]

High Performance Computing China Champion

Tianhe No. 1 (TH-1) stands for " Tianhe No. 1 Supercomputer System ". It is a heterogeneous supercomputer provided by China National Defense University and Tianjin Binhai New Area. The name "Tianhe" means "Yinhe". Tianhe-1's operating system is Galaxy Unicorn. Inspur Group is also involved in the construction of this global supercomputer.
In October 2010, the "TOP100 Ranking of China's High-Performance Computers 2010" was officially released. The "Tianhe No. 1" supercomputer system after technology upgrade and optimization has a peak performance of 4700 trillion times per second and a continuous performance of measured values of LINPACK. The performance of 2507 trillion times per second topped the list. The upgraded and optimized "Tianhe No. 1" is equipped with 14,336 Xeon X5670 processors, 7,168 Tesla M2050 computing cards based on the Nvidia "Fermi" architecture, 2048 Feiteng processors developed by the National University of Defense Technology and 5PB storage devices.
The peak performance of Tianhe-1A has been increased by 3.89 times and the continuous performance has been increased by 4.45 times. Its computing speed and energy efficiency have reached the current international leading level. The measured performance of the upgraded "Tianhe One" is 1.425 times that of the world's fastest supercomputer Jaguar. Compared with the "Tianhe No. 1" Phase I system, which was born one year ago, the peak performance and continuous performance of the Phase II system have increased by 2.89 times and 3.45 times, respectively.
Its peak speed is 4700TFlops and continuous speed is 2566TFlops (LINPACK measured value). It participated in the TOP500 ranking of world supercomputers in November 2010, ranking first in the world. [7]
According to the TOP500 ranking in June 2014, Tianhe 2 developed by China National University of Defense Technology ranked third in the world for the third consecutive time. The measured speed of LINPACK was 33,862.7TFlop / s, and the theoretical peak value was 54,902.4TFlop / s. [8]

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?