What Is an Event Cascade?
The detection form of the larger threat pattern is called event correlation. As network attack methods continue to mature, event correlation methods continue to expand, such as considering event data from more types of security devices, considering additional event contexts such as user privileges and asset vulnerabilities, and searching for more complex threat patterns. [1]
- Event correlation simplifies threat detection methods by organizing a large amount of discrete event data and analyzing it as a whole to find important patterns and incidents that need immediate attention. Although early event correlation focused on reducing the number of events to simplify event management, events are usually filtered, compressed, or summarized, and newer technologies use state logic to analyze the event flow that occurs, while performing pattern recognition to find network problems,
- Event correlation technology is an important fault location strategy, and it has been a research hotspot for a long time. The basic idea is to filter unnecessary and irrelevant events by associating multiple events as a single conceptual event, providing network administrators with a more streamlined view of event information to accurately and quickly identify the source of the fault.
Definition of event correlation faults and events
- (1) Failure
- Fault (Fault) is also called fault source, that is, the managed network and its components have hardware or software problems and cannot provide normal services.
- Faults can be divided into the following two categories according to their nature.
- Hard failure: failures such as sensor failure and connection interruption can generally be solved by replacing hardware or debugging related software and re-initializing.
- Soft faults: faults caused by problems such as network congestion, software malfunction, exhaustion of resources, and decline in switching efficiency.
- (2) Events
- An event is also called an alarm. It is the external manifestation of a fault, that is, the state of a managed object after its abnormal state changes.
- By their nature, events can be divided into the following two categories.
- Connectivity event: An event where the connection from the Manager to the managed object fails, so that the device no longer has network connectivity and cannot communicate.
- Performance event: Although the connection of the device exists, the event is triggered because the value of the MIB object related to the fault management of the managed object exceeds a preset threshold. For example, an excessively high IP datagram error rate may imply a failure of the IP layer protocol entity. The maximum error rate threshold can be set to trigger performance events and generate alarms. For some applications, the degradation of the performance of network system resources is also a failure.
- Faults are related to events. Faults are causes, and events are results. Alarms are observable and easily accessible, but faults are usually hidden behind a large number of alarms.
Event correlation
- Event correlation synthesizes the semantic correlation of managed network elements. By processing alarm information in time (temporal) and space (spatial), the number of alarm messages is reduced, which helps to discover the real cause of the failure. For the alarm information, the time stamp it carries is a very critical piece of information. The time correlation process is to correlate alarm sequences from the perspective of time series for fault location. The spatial information mainly refers to the topology information of the network, which is explicitly or implicitly included in the fault alarm. The spatial correlation process is to correlate the alarm sequence from the perspective of the network topology in order to locate the fault.
- The typical alarm correlation method is as follows.
- 1. Compression: Compress the same alarm that occurs multiple times into one alarm of the same type.
- 2. Filtering: Ignore the alarms that do not meet the given conditions.
- 3. Suppression: Suppress certain alarms in a specific context, such as ignoring low-level alarms when high-level alarms occur.
- 4. Count: Replace the specified number of duplicate alarms as a new type of alarm.
- 5. Summary: Refer to the alert through its superclass.
- 6. Refinement: Replace an alarm with a more specific sub-type alarm. [2]