What Is Data Stream Mining?

A data stream is an ordered sequence of bytes with a start and an end. Includes input and output streams.

data flow

A data stream is an ordered sequence of bytes with a start and an end. Includes input and output streams.
Data flow was originally a concept used in the field of communications and represented a digitally encoded signal sequence of information used in transmission. This concept was originally proposed in 1998 by Henzinger in reference 87, which defined a data stream as "a sequence of data that can only be read once in a pre-defined order".
Chinese name
data flow
Foreign name
data stream
Concept presenter
Henzinger
Presentation time
1998
Definition
A data sequence that is read once in the prescribed order
Reason for development
2
Data mode
4
Calculation type
Can be divided into two categories: basic calculations and complex calculations
The development of data flow applications is the result of two factors:
Difference from traditional relational data models
B. Babcock et al. [90] believe that the data flow pattern is different from the traditional relationship in the following aspects
Different properties and formats of data result in different processing methods for streams. Therefore, in the Java input / output class library, there are different stream classes to correspond to input / output streams of different properties. In java. In the io package, the basic input / output stream classes can be divided into two types according to the types of data they read and write: byte streams and character streams.
We try to summarize and describe the data flow model from three different aspects: data collection, data attributes, and calculation types. In fact, many articles have proposed various data flow models. We have not included all these models, but have summarized and classified the more important and common ones.
Researchers continue to study data stream processing, and we believe that the following new trends have emerged:
Future sketch
Introduce more statistics
Technical sketches
G. Cormode and others mainly deal with the calculation of frequent terms. It is based on the previous majority item algorithm ([116, 117]) and uses error-correcting codes to deal with the problem. For example, a counter is set up for each bit of data, and the frequent item set is inferred based on the counting results of these counters.
Y. Tao et al. [118] is essentially an application of Probabilistic counting (distinct counting that has been widely used in the database field) in data stream processing.
Expanded thumbnail
Extend the sketch to handle more complex queries
Lin et al. [93] constructed a complex sketch system that can be used for sliding window model and n-of-N model quantile estimation, which is difficult to do with simple sketches.
In the sliding window model, the literature [93] divides the data into multiple buckets in chronological order, establishes thumbnails in each bucket (the accuracy is higher than required), and then merges these thumbnails when querying. The last bucket may need to be lifted. During maintenance, only expired buckets are deleted, and new buckets are added.
In the n-of-N model, the literature [93] divides the data into multiple buckets of different sizes according to EH Partitioning technology, establishes a thumbnail in each bucket (higher accuracy than required), and then queries a part of it The combination of sketches can guarantee the required accuracy, of which the last one may need to be improved.
Combining spatiotemporal data
versus
The network novel data stream is a new genre, which means that the protagonist's strength is digitized, and the same data display as the online game attribute bar.

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?