What is a Consolidated Tape?
The sorting process of a tape file is similar to the sorting process of a disk file. First, the records of a file are segmented into memory for internal sorting. Several ordered segments are generated and written to the tape. These ordered segments are then repeatedly merged. Finally, the ordered segments are merged into an ordered segment containing all records in the file. In the process of sorting tape files, the situation that the ordered segments are distributed on different tapes and different bits on the same tape has a great impact on the efficiency of sorting.
- The sorting process for tape files is similar
- Because tape is a sequential access device, the time required to access data is
Tape file sorting balance merge sort
- The so-called balanced sorting means that in the sorting process, the initial ordered segments or the ordered segments generated by the previous merge must be evenly distributed to each input band before the next merge. [1]
- As with the sorting of disk files, the time required to sort tape files is closely related to the number of scans of the tape files. Using k (2) merges can reduce the number of scans of tape files. [1]
- To avoid too much tape seek time, the ordered segments that are currently being merged must be placed on separate tape drives. Therefore, k-way merges require at least k tape drives as inputs during the merge. You also need one as an output. In this way, at least (k + 1) tape drives are required for k-channel merge. If (k + 1) tape drives are used to implement k-way merge, then the output tape needs to be scanned another time, and the ordered segments generated during the previous merge are evenly distributed to k machines for the next merge. If 2k stations are used, the above-mentioned problem of redistributing ordered segments can be avoided, because when k channels are merged, k stations can be used as inputs, and the remaining k stations can be used as outputs. The output band is used as the input band and the input band is used as the output band when the file is merged next time. [1]
Tape file sort multi-stage merge sort
- If k-way merging is used, then 2k tape drives are needed in order to avoid scanning for output ordered segment redistribution. If the characteristics of the k-order Fibonacci sequence are used, for k-channel merging, only (k + 1) stations are required to avoid scanning for the output order segment re-distribution. [1]
- Here, we assume that the length of the initial ordered segment is the number of records that can be sorted internally once in memory. As the merge progresses, the length of the newly generated ordered segment also gradually increases. We use marks
- Now k = 2 is taken as an example to explain the execution process of multi-stage merge sort. For two-way merging, three tape drives are needed, which is written as
- (1) Generate 21 initial ordered segments from the input file, and distribute 13 initial ordered segments in
- (2) put
- (3) put
- (4) Put
- (5) put
- (6) put
- (7) put
- It can be known from the above process that, except for the last stage, there is one and only one tape drive in each of the remaining stages becomes empty, and the number of empty tapes in each stage is sequentially rotated in 3, 2, and 1 cycles. The empty band that appeared in the previous stage is used as the output band of the next stage, so that the redistribution of ordered segments is avoided. At the same time, the number of ordered segments that each non-empty tape participates in merging is the number of least-ordered segments in these tapes, so there must be an empty tape after performing a certain stage. In each phase, the number of ordered segments that each non-empty tape drive participated in the merge was exactly a certain Fibonacci number. If the number of ordered segments that each non-empty tape drive participated in the merge in the previous stage is
- The key to achieving correct merging is to choose a correct initial ordered segment distribution. To this end, we first take the case of k = 2 as an example to find the general law, and then promote it. [1]
- In general, for the k-channel multi-stage merge sorting of (k + 1) tape drives and the specified n, the number of initial ordered segments distributed on the i (1ik) stage is:
- (1) When n = 0,
- (2) When n> 0,
- The total number of initial ordered segments on stage k is