What Is a Hash Function?
In general linear tables, the relative positions of records in the structure in the tree are random, that is, there is no definite relationship with the keywords of the records. Therefore, when searching for records in the structure, a series of comparisons with keywords are required. . This type of search method is based on "comparison", and the efficiency of the search depends on the number of comparisons made during the search. Ideally, the required record can be found directly, so a certain correspondence relationship f must be established between the storage location of the record and its keywords, so that each keyword corresponds to a unique storage location in the structure.
- Chinese name
- Hash function
- Foreign name
- Hash Function
- Other name
- Hash function
- expression
- Addr = H (key)
- Action 1
- encryption
- Action 2
- Speech Recognition
- Action 3
- Hash table
- Field
- Computer algorithm
- In general linear tables, the relative positions of records in the structure in the tree are random, that is, there is no definite relationship with the keywords of the records. Therefore, when searching for records in the structure, a series of comparisons with keywords are required . This type of search method is based on "comparison", and the efficiency of the search depends on the number of comparisons made during the search. Ideally, the required record can be found directly, so a certain correspondence relationship f must be established between the storage location of the record and its keywords, so that each keyword corresponds to a unique storage location in the structure.
The concept and role of hash tables
- The elements in the hash table are determined by a hash function. Using the key K of the data element as an argument, the value calculated through a certain functional relationship (called a hash function) is the storage address of the element. Expressed as:
- Addr = H (key)
- For this reason, there are two main problems to be solved before setting up a hash table:
- Construct a suitable hash function
- The value of uniformity H (key) is evenly distributed in the hash table;
- Simple to speed up address calculation
- Conflict management
- Conflict: In the hash table, different key values correspond to the same storage location. That is, the keyword K1 K2, but H (K1) = H (K2). A uniform hash function can reduce conflicts, but it cannot avoid them. After a conflict has occurred, it must be resolved; that is, the next available address must be found.
- Solutions to conflicts: [1]
- Link method (zip method). Store records with the same hash address in a linear linked list. For example, in the remainder division method, the keyword is (18,14,01,68,27,55,79), and the divisor is 13. The hash address is (5,1,1,3,1,3,1), and the hash hash table is shown in the figure.
- Open addressing method. If h (k) is already occupied, probe in the following sequence: (h (k) + p)% TSize, (h (k) + p)% TSize, ..., (h (k) + p (i))% TSize, ...
- Among them, h (k) is a hash function, TSize is a hash table length, and p (i) is a probe function. On the basis of h (k) + p (i-1))% TSize, if a conflict is found, a new detection is performed using the increment p (i) until no conflict occurs. Among them, according to the different probe functions p (i), the open addressing method is divided into linear probe methods (p (i) = i: 1, 2, 3, ...), and the second probe method (p (i) = (- 1) ^ (i-1) * ((i + 1) / 2) ^ 2, the probe sequence is: 1, -1,4, -4, 9), random probe (p (i): random Number), double hash function method (double hash function h (key), hp (key) if h (key) conflicts, then use hp (key) to find the hash address. The probe sequence is: h (k ), h (k) + hp (k), ..., h (k) + i * hp (k)).
- (3) Bucket addressing method. Bucket: A large enough storage space. Bucket Addressing: Associate a bucket with each address in the table. If the bucket is full, it can be handled using open addressing. For example, insert A5, A2, A3, B5, A9, B2, B9, C2, and use linear probing to resolve conflicts. As shown in the figure.
Hash function hash table construction method
Hash function direct addressing
- For example: there is a demographic table from 1 to 100 years old, where age is the key and the hash function takes the key itself.
Hash function number analysis
- The birthday data of some students are as follows:
- year month day
- 75.10.03
- 75.11.23
- 76.03.02
- 76.07.12
- 75.04.21
- 76.02.15
- ...
- After analysis, the first, second, and third positions are more likely to be repeated. Taking these three positions increases the chance of conflict, so it is better not to take the first three positions, and the last three positions are better.
Hash function square method
- Take the middle digits of the squared key as the hash address.
Hash function folding
- The keyword is divided into several parts with the same number of digits (the number of digits in the last part can be different), and then the sum of these parts (rounded up) is used as the hash address. This method is called the folding method.
- For example, each Western language book has an international standard book number, which is a 10-digit decimal number. If you want to use it as a key to create a hash table, you can use this when the number of books in the collection is less than 10,000. Method to construct a four-digit hash function.
Hash function division by remainder method
- Take the remainder after the key is divided by a number p not larger than the hash table length m to obtain the hash address.
- H (key) = key MOD p (p <= m)
Hash function random number method
- Choose a random function, and take the random function of the keyword as its hash address, that is,
- H (key) = random (key), where random is a random function. This method is usually used when the keywords have different lengths.
- If a hash function and a conflict resolution method are known, the steps for establishing a hash table are as follows:
- Step1. Take out the key of a data element and calculate its storage address D = H (key) in the hash table. If the storage space with storage address D has not been occupied, the data element is stored; otherwise, a conflict occurs, and Step 2 is performed.
- Step2. According to the specified conflict processing method, calculate the next storage address of the data element whose key is the key. If the storage space of the storage address is not occupied, it is stored; otherwise, continue to perform Step 2 until a storage address of which the storage space is not occupied is found.
Hash function conflict
- No matter how fine the hash function design is, there will be conflicts, that is, the results of the two keyword processing functions are mapped to the same location, so there are some ways to avoid conflicts.
Hash function zipper method
- Pulling out a dynamic linked list instead of the static sequential storage structure can avoid conflicts in hash functions, but the disadvantage is that the design of the linked list is too cumbersome and increases the programming complexity. This method can completely avoid the collision of hash functions.
Multi- hash method
- Designing two or more hash functions can avoid conflicts, but there are still chances of conflict. The better or more the function is designed, the lower the probability (unless the character is too bad, it is almost impossible to conflict).
Hash function open address method
- The open address method has a formula: Hi = (H (key) + di) MOD mi = 1,2, ..., k (k <= m-1)
- Among them, m is the table length of the hash table. di is the incremental sequence when a conflict occurs. If the di value may be 1,2,3, ... m-1, it is called linear probing and hashing.
- If di takes 1, then after each collision, move backward by 1 position. If di takes values, it may be 1, -1,4, -4,9, -9,16, -16, ... k * k , -k * k (k <= m / 2)
- This is called secondary probing and hashing. If the value of di may be a pseudo-random number sequence. It is called pseudo-random detection and then hashing.
Hash function domain method
- Assume that the range of the hash function is [0, m-1], then set the vector HashTable [0..m-1] as the basic table, and set up a storage space vector OverTable [0..v] to store the recording.