What Is Data Stream Mining?
A data stream is an ordered sequence of bytes with a start and an end. Includes input and output streams.
data flow
- Chinese name
- data flow
- Foreign name
- data stream
- Concept presenter
- Henzinger
- Presentation time
- 1998
- Definition
- A data sequence that is read once in the prescribed order
- Reason for development
- 2
- Data mode
- 4
- Calculation type
- Can be divided into two categories: basic calculations and complex calculations
- A data stream is an ordered sequence of bytes with a start and an end. Includes input and output streams.
- Data flow was originally a concept used in the field of communications and represented a digitally encoded signal sequence of information used in transmission. This concept was originally proposed in 1998 by Henzinger in reference 87, which defined a data stream as "a sequence of data that can only be read once in a pre-defined order".
- The development of data flow applications is the result of two factors:
- Difference from traditional relational data models
- B. Babcock et al. [90] believe that the data flow pattern is different from the traditional relationship in the following aspects
- Different properties and formats of data result in different processing methods for streams. Therefore, in the Java input / output class library, there are different stream classes to correspond to input / output streams of different properties. In java. In the io package, the basic input / output stream classes can be divided into two types according to the types of data they read and write: byte streams and character streams.
- We try to summarize and describe the data flow model from three different aspects: data collection, data attributes, and calculation types. In fact, many articles have proposed various data flow models. We have not included all these models, but have summarized and classified the more important and common ones.
- Researchers continue to study data stream processing, and we believe that the following new trends have emerged:
- Future sketch
- Introduce more statistics
- Technical sketches
- G. Cormode and others mainly deal with the calculation of frequent terms. It is based on the previous majority item algorithm ([116, 117]) and uses error-correcting codes to deal with the problem. For example, a counter is set up for each bit of data, and the frequent item set is inferred based on the counting results of these counters.
- Y. Tao et al. [118] is essentially an application of Probabilistic counting (distinct counting that has been widely used in the database field) in data stream processing.
- Expanded thumbnail
- Extend the sketch to handle more complex queries
- Lin et al. [93] constructed a complex sketch system that can be used for sliding window model and n-of-N model quantile estimation, which is difficult to do with simple sketches.
- In the sliding window model, the literature [93] divides the data into multiple buckets in chronological order, establishes thumbnails in each bucket (the accuracy is higher than required), and then merges these thumbnails when querying. The last bucket may need to be lifted. During maintenance, only expired buckets are deleted, and new buckets are added.
- In the n-of-N model, the literature [93] divides the data into multiple buckets of different sizes according to EH Partitioning technology, establishes a thumbnail in each bucket (higher accuracy than required), and then queries a part of it The combination of sketches can guarantee the required accuracy, of which the last one may need to be improved.
- Combining spatiotemporal data
- versus
- The network novel data stream is a new genre, which means that the protagonist's strength is digitized, and the same data display as the online game attribute bar.