What Is a Distributed Operating System?
A distributed system is a software system built on a network. Because of the nature of software, distributed systems are highly cohesive and transparent. Therefore, the difference between a network and a distributed system lies more in the high-level software (especially the operating system) rather than the hardware. [1]
- In a distributed system, a group of independent computers is presented to the user as a unified whole, as if it were a system. System
- The architecture of a distributed computer system can be described by using the degree of coupling between processors as the main indicator. Coupling degree is the tightness of the interconnection between system modules. It is a comprehensive reflection of performance indicators such as data transmission rate, response time, and parallel processing capability. It mainly depends on the interconnection topology of the selected architecture and the type of communication link. [2]
- A distributed system is a loosely coupled system in which multiple processors are interconnected through communication lines. From the perspective of a processor in the system, the remaining processors and corresponding resources are remote, and only its own resources are local. So far, there has not been a unified view of the definition of distributed systems. It is generally believed that distributed systems should have the following four characteristics: [3]
- Distributed systems are used in many different types of applications. Some applications are listed below. For these applications, using distributed systems is more important than others.
- There are similarities and differences between distributed computer systems and computer networks. The main similarities and differences are as follows: [5]
- Although distributed systems have many advantages, due to the characteristics of the distributed system itself and the complexity of the application environment, the distributed system design has many problems to be solved as follows: [6]
- 1. Partial failure problem
- Because a distributed system usually consists of several parts, each part may fail due to various reasons, such as hardware failure, software error, and incorrect operation. If a distributed system does not effectively deal with these failures, the failure of one part of the system may cause the entire system to be paralyzed. [6]
- 2. Performance and reliability are overly dependent on the network
- Because the distributed system is built on the network, and the network itself is unreliable, failures may often occur, and network failures may lead to the termination of system services. In addition, overloading the network will reduce performance and increase system response time. [6]
- 3. Lack of unified control
- The control of a distributed system is usually a typical decentralized control without a unified central control. Therefore, distributed systems usually require corresponding synchronization mechanisms to coordinate the work of various parts of the system. Designing and implementing a distributed system that is transparent to users and fault-tolerant is a challenging task, and the required mechanisms and strategies are not yet mature. Therefore, what kind of program design model and control mechanism are most suitable for distributed systems is still a topic that needs to be further studied. [6]
- 4. Difficult to design a reasonable resource allocation strategy
- In a centralized system, all resources are managed and allocated by the operating system, but in a distributed system, resources belong to each node, so the scheduling flexibility is not as good as in a centralized system, and the physical distribution of resources may not match the distribution of user requests. Matching, some resources may be idle while others may be overloaded. [6]
- 5. Security and confidentiality issues
- Openness makes many software interfaces in distributed systems available to users. This open structure is very valuable to developers, but it also opens the door for disruptors. [6]
- Aiming at the above-mentioned difficulties in distributed systems, to ensure the normal operation of a distributed system, it is necessary to effectively manage system resources and provide effective processing methods and support mechanisms for communication, failure, and security issues between computers. [6]
- Users' requirements for distributed systems are transparency, security, flexibility, simplicity, and reliability. They also require the ability to easily reconstruct the system in the event of a partial failure, and the ability to integrate heterogeneous subsystems. [6]
- The distribution of resources, the lack of global state information, and transmission delays mean that certain methods and technologies of centralized operating systems cannot be applied to distributed systems. Even if certain technologies in a centralized system meet the above requirements, their implementation is usually very costly. [6]