What Is a Double Array?
Double-ArrayTrie (Double-ArrayTrie) is a simple and effective implementation of the trie tree. It consists of two integer arrays, one is base [] and the other is check []. Let the array index be i. If base [i] and check [i] are both 0, the position is empty. If base [i] is negative, the state is a word. Check [i] represents the previous state of this state, t = base [i] + a, check [t] = i
Double array trie tree
Right!
- Double-ArrayTrie (Double-ArrayTrie) is a simple and effective implementation of the trie tree. It consists of two integer arrays, one is base [] and the other is check []. Let the array index be i. If base [i] and check [i] are both 0, the position is empty. If base [i] is negative, the state is a word. Check [i] represents the previous state of this state, t = base [i] + a, check [t] = i
First encode all 10 Chinese characters that appear in the vocabulary: Ah-1, Ah-2, -3, Gen-4, Jiao-5, La-6, and -7, Ting-8, Bo-9, Person-10. . For each Chinese character, a base value needs to be determined so that all words that start with the Chinese character can be placed in the double array. For example, now to determine the base value of the word "", assuming that the second word sequence code of the word starting with "" is a1, a2, a3 ... an in order, we must find a value i such that base [i + a1], check [i + a1], base [i + a2], check [i + a2] ... base [i + an], check [i + an] are all 0. Once this i is found, the base value of "A" is determined to be i. Use this method to build a Double-ArrayTrie (Double-ArrayTrie), after four traversals, put all the words into the double-array, and then traverse the vocabulary once and modify the base value. Because we use a negative base value to indicate that the position is a word. If the state i corresponds to a certain word and Base = 0, then let Base = (-1) * i; if the value of Base is not 0, then let Base = (-1) * Base. The resulting double array is as follows:
Subscript 1234567891011121314
- Base-1 4 4 0 0 0 0 4 -9 4 -11 -12 -4 -14
- Check000000022238 10 13
- Affixes Ah Ae Agan Ajia Allah Egypt Argentina Arab Arab
- The double array generated by the above method will be "ah", "Ah", "Eh", "Agan", "Allah", "Ejiao", "Egypt", "Arab", "Arab", "Argentina" Are considered status. Each state corresponds to an index of the array. For example, if the subscript of "Agen" is i = 8, then the content of check is the subscript of "A", and base is the base value of the subscript of "Argentina". The sequence code of "Ting" is x = 8, then the subscript of "Argentina" is base + x = base [8] + 8 = 12.
- 1, query
- The query process of the trie tree is actually a DFA state transition process. It is relatively simple to implement in a double array: you only need to perform the state transition according to the state flag. For example, query "Argentina", first find the subscript 2 of the state "A" according to the sequence code b = 2 of "A", and then find the subscript base + d of the "Agen" according to the sequence code d of the "root" = 8, and according to check [base + d] = b, it shows that "Agen" is part of a word, and you can continue to query. Then find the status "Argentina". Its subscript is y = 12, at this time base [y]
- Simple optimization
- The basic idea of optimization is to construct a double-array trie tree as a dynamic retrieval method, thereby solving the problems of insertion and deletion.
- 1. Insertion optimization
- When the insertion needs to determine the new BASE value, we only need to traverse the empty state. The appearance of a non-empty state means that a certain BASE value attempt is defeated, and we can ignore it completely. Therefore, we can construct a sequence for all empty states, and only need to scan the sequence when determining the BASE value.
- For the incremental nodes r1, r2, ..., rm of the empty state in the double array, we can construct this empty sequence like this:
- CHECK [ri] = ri + 1 (1im1),
- CHECK [rm] = (DA_SIZE + 1)
- Where r1 = E_HEAD is the index point corresponding to the first null state. In this way, we only need to scan this sequence when determining the BASE value. This saves access time to non-empty states.
- This method can greatly improve the insertion speed without too many empty states.
- 2, delete optimization
- 1) Useless nodes
- For the useless nodes generated when deleting the leaf nodes, you can make them empty by judging in order, so that they can be reused when inserting new words. For example, if we delete "Argentina" in the above example, we can see that the "Agen" state has no substates, so we can also leave it empty. The "A" state cannot be left blank because it has two sub-states.
- 2) Compression of array length
- After deleting a state, we can directly delete the continuous empty state that may appear at the end of the array. In addition, we can re-determine the BASE value for the state of the largest non-empty index point, because it may have become smaller due to the deletion. Here we may be able to delete some null values.