What Is Sequence Mining?

Sequence pattern mining (sequence pattern mining) refers to mining patterns that occur relatively frequently in time or other patterns. Typical applications are still limited to discrete sequences.

Sequence pattern mining (sequence pattern mining) refers to mining patterns that occur relatively frequently in time or other patterns. Typical applications are still limited to discrete sequences.
Chinese name
Sequential pattern mining
Foreign name
sequence pattern mining
Definition
High frequency patterns
Sequential pattern mining
Very useful
Presenter
Agrawal and Srikant
Item set
Itemset
Application area
Web access pattern prediction

Introduction to Sequential Mode

The concept of sequential patterns was first proposed by Agrawal and Srikant.
Motivation: The transaction data of large supermarket chains has a series of user transaction databases. Each record includes the user's ID, the time when the transaction occurred, and the items involved in the transaction. If you can mine a pattern that involves the relationship between transactions, that is, the connection between several purchases by users, you can take more targeted marketing measures.
Example: A transaction database, a transaction represents a transaction, a single item represents the product of the transaction, and the number in the single item attribute records the product ID.
Sequence: Expressed in SID, a sequence is a complete information flow.
Item (Item): The set of smallest constituent units in the sequence, for example, the item in this example is {A, B, C}.
Event (Event): Usually timestamp marks are used to identify the context between events. Also called Itemset, it is a collection of Items, which is represented by EID in the sample.
k frequent sequences: If the number of items in the frequent sequence is k, it is called a k frequent sequence and is represented by Fk (F1, F2, F3 in Figure 1).
Sequence containment relationship: For sequences x and y, if there is an order-preserving mapping such that each event in x is included in an event in y, then x is included in y (x is y Subsequence), for example, the sequence B-> AC is a subsequence of the sequence AB-> E-> ACD.
Support: The support of a sequence x refers to the frequency of the sequence containing x in the entire sequence set.

Sequence pattern definition

Given a collection of different sequences, where each sequence consists of different elements in an orderly order, each element (transaction) consists of a different item, and given a user-specified minimum support threshold, sequence mode Mining is to find all frequent subsequences, that is, the frequency of occurrence of the subsequence in the sequence set is not lower than the minimum support threshold specified by the user.

Symbolic representation of sequential patterns

Itemsets (Itemset) is a collection of various items
Sequence is an ordered arrangement of different item sets. The sequence s can be expressed as s = <s1s2 ... sl>, and sj (1 <= j <= l) is the item set, also called sequence elements of s
The elements of the sequence can be expressed as (x1x2 ... xm), and xk (1 <= k <= m) is a different item. If a sequence has only one item, the brackets can be omitted
The number of all items contained in a sequence is called the length of the sequence. A sequence of length l is denoted as a l-sequence

Sequential pattern sequence mining algorithm steps

1) Sorting phase. Database D sorts by customer number as the primary key and transaction time as the secondary key. This stage transforms the original transaction database into a database composed of client sequences. [1]
2) Frequent itemsets stage. Find the set L consisting of all frequent itemsets. The set of all frequent 1-sequences is also obtained simultaneously. [1]
3) Conversion stage. In the process of finding sequence patterns, it is necessary to continuously check whether a given frequent set is included in a client sequence. [1]
4) The sequence phase uses the set of known frequent sets to find the required sequence. Similar to the associated Apriori algorithm. [1]

AprioriAll Sequential mode AprioriAll algorithm

The execution process of AprioriAll algorithm is the same as that of Apriori algorithm. The difference lies in the generation of candidate sets. The generation of specific candidates is as follows:
When generating the candidate set, it is necessary to distinguish between before and after the last two elements, so there are <p.item1, p.item2, ..., p., Q.> And <p.item1, p.item2, ..., q., P .> Two elements. [1]

AprioriSome Sequential mode AprioriSome algorithm

The AprioriSome algorithm can be regarded as an improvement of the AprioriAll algorithm, which can be divided into two stages:
(1) Forward stage: find all large sequences of the top length. After Li is generated, according to the judgment function j = next (last), at this time last = i, j> i, no candidate for i + 1 is generated in the next stage Term, but a candidate for j. If j = i + 1, then Cj is generated according to Li. If j> i + 1, then Cj is generated by Cj-1. Then scan the database to calculate the support of Cj.
(2) Backward phase: According to the large item set in Lj, remove the Lj items appearing in Ci (i <j), then calculate the support degree in Ci, and judge those itemsets that were missed in the Forward phase.
Comparison of AprioriAll algorithm and AprioriSome algorithm:
(1) AprioriAll is used to calculate all candidate Ck, and AprioriSome will be directly used to calculate all candidates. Because it contains, AprioriSome will generate more candidates.
(2) Although AprioriSome jump-calculates candidates, it may fill up the memory before the backtracking stage because it generates more candidates.
(3) If the memory is full, AprioriSome will be forced to calculate the candidates for the last group.
(4) For lower support and longer sequences, the AprioriSome algorithm is better. [1]

GSP Sequential mode GSP algorithm

GSP (Generalized Sequential Patterns) algorithm, similar to Apriori algorithm, is roughly divided into three stages: candidate set generation, candidate set counting, and extended classification. Compared with the AprioriAll algorithm, the GSP algorithm counts fewer candidate sets, and does not need to calculate frequent sets in advance during the data conversion process.
The calculation steps of GSP are similar to Apriori, but the main difference lies in the generation of candidate sequence patterns. GSP generation of candidate sequence patterns can be divided into the following two steps:
(1) Connection phase: If the first item in sequence mode S1 is removed and the sequence obtained by removing the last item in sequence mode S2, S1 and S2 can be connected, that is, the last item in S2 is added to S1 go with.
(2) Pruning stage: If a subset of a candidate sequence pattern is not a sequence pattern, the candidate sequence pattern cannot be a sequence pattern, and it is deleted from the candidate sequence pattern. [1]

VS Sequential patternSequential pattern vs association rules

problem
Sequential pattern mining
Association rule mining
data set
Sequence database
Transaction database
focus point
Relationships between items within the same transaction and between transactions
Relationships between items within the same transaction

Sequential mode typical tools

SAS Enterprise Miner: Provides data mining including regression, classification, and statistical analysis packages. It features a variety of statistical analysis tools. [2]
SGI's MineSet: provides mining algorithms for correlation and classification as well as advanced statistical and visualization tools. Features are powerful graphical tools including rule visualization tools, tree visualization tools, map visualization tools, and multidimensional data decentralized visualization tools. They are used to visualize data and data mining results. [2]
ISL's Clementine: Provides an integrated data mining development environment for end users and developers. The system integrates a variety of data mining algorithms such as rule induction, neural networks, classification, and visualization tools. Clementine is now acquired by SPSS. [2]

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?