What Is Code Migration?
Data migration (also called hierarchical storage management, hsm) is a technology that combines offline storage with online storage. It uses high-speed, high-capacity non-online storage devices as the next-level device of the disk device, and then automatically migrates the data commonly used in the disk to a second-level large-capacity storage device such as a tape library (referred to as a tape library) according to a specified strategy. When the data is needed, the tiered storage system will automatically transfer the data from the next-level storage device back to the upper-level disk. For users, the above-mentioned data migration operation is completely transparent, only slightly slowing down in the speed of accessing the disk, and the capacity of the logical disk obviously feels greatly improved.
- Chinese name
- data migration
- Foreign name
- Data Migration
- nickname
- Tiered storage management
- Data migration (also called hierarchical storage management, hsm) is a technology that combines offline storage with online storage. It uses high-speed, high-capacity non-online storage devices as the next-level device of the disk device, and then automatically migrates the data commonly used in the disk to a second-level large-capacity storage device such as a tape library (referred to as a tape library) according to a specified strategy. When the data is needed, the tiered storage system will automatically transfer the data from the next-level storage device back to the upper-level disk. For users, the above-mentioned data migration operation is completely transparent, only slightly slowing down in the speed of accessing the disk, and the capacity of the logical disk obviously feels greatly improved.
- Data migration is the archiving process of moving rarely used or unused files to a secondary storage system such as tape or optical disk. These files are usually image files or historical information that needs to be easily accessible at any time in the future. Migration is combined with a backup strategy and still requires regular backups. It also includes computer data migration, migration of data, applications, personalization settings from the old computer (old system) to the new computer (new system), which is necessary after the system upgrade.
- The realization of data migration can be divided into three stages: preparation before data migration, implementation of data migration, and verification after data migration. Due to the characteristics of data migration, a large amount of work needs to be completed during the preparation phase. Sufficient and thoughtful preparation is the main basis for completing data migration. Specifically, the detailed description of the data source to be migrated (including the data storage method, data volume, and time span of the data); establishing a data dictionary of the old and new system databases; performing quality analysis of the historical data of the old system, and the old and new system data Structural difference analysis; difference analysis of new and old system code data; establishment of new and old system database table mapping relationships, and processing methods for unmapped fields; development and deployment of ETL tools, writing test plans and verification procedures for data conversion; formulating data Contingency measures for conversion.
- Among them, the implementation of data migration is the most important link in the three stages of data migration. It requires the establishment of detailed implementation steps and processes for data conversion; preparation of a data migration environment; business preparations, ending unfinished business matters, or bringing it to an end; all technologies involved in data migration are tested; and finally data migration is implemented.
- The verification after the data migration is a check of the migration work. The result of the data verification is an important basis for judging whether the new system can be officially launched. You can verify the data by using quality inspection tools or writing inspection programs, and test the function modules of the new system, especially the query and report functions, to check the accuracy of the data.
Data Migration Technical Preparation for Data Migration
- Data conversion and migration usually includes multiple tasks: data dictionary organization of the old system, data quality analysis of the old system, data dictionary organization of the new system, data difference analysis between the old and new system, establishment of the mapping relationship between the old and new system data, development and deployment of data conversion and migration Procedures, formulate emergency plans during data conversion and migration, implement the conversion and migration of data from the old system to the new system, and check the integrity and correctness of the data after the conversion and migration.
- The process of data conversion and migration can be roughly divided into three steps: extraction, conversion, and loading. Data extraction and conversion are performed according to the mapping relationship between the old and new system databases, and data difference analysis is the prerequisite for establishing the mapping relationship, which also includes the difference analysis of code data. The conversion step generally also includes the process of data cleaning. Data cleaning is mainly aimed at the source database to perform corresponding cleaning operations on data that has ambiguous, repeated, incomplete, or violate business or logical rules. Perform data quality analysis to find out the problematic data, otherwise data cleaning will not be possible. Data loading is to load the extracted and transformed result data into the target database through a loading tool or a SQL program written by yourself. [1]
- The examination of the data includes the following 6 aspects.
- (1) Data format check. Check whether the data format is consistent and available. The target data is required to be of type number.
- (2) Data length check. Check the effective length of the data. Special attention should be paid to the conversion of char fields to varchar types.
- (3) Interval range inspection. Checks whether the data is included in the defined maximum and minimum intervals. For example, an age of 300 or an entry date of 4000-1-1 is obviously a problem.
- (4) Check for null value and default value. Check whether the null values and default values defined by the old and new systems are the same. Different database systems may have different definitions of null values, which requires special attention.
- (5) Integrity check. Check the associated integrity of the data. For example, if the code value referenced by the record exists, it is particularly important to note that some systems have removed the foreign key constraint after a period of use to improve efficiency.
- (6) Consistency check. Check logically for data that violates consistency, especially for systems that submit operations separately.
Data migration tool selection
- There are two main options for the development and deployment of data migration tools, namely self-developed programs or purchase of mature products. These two options have their own different characteristics, and they must be analyzed according to the specific conditions when selecting. Looking at the current large-scale projects in China, most of them use relatively mature ETL products during data migration. It can be seen that these projects have some common characteristics, including: a large amount of historical data during migration, a short allowed downtime, a large number of customers or users, the existence of third-party system access, and the impact of failure The face will be wide. It should also be seen that self-developed programs are also widely adopted.
- Currently, many database vendors provide data extraction tools, such as Informix's InfoMover, Microsoft SQL Server's DTS and Oracle's Oracle Warehouse Builder. These tools address the extraction and transformation of data to a certain extent. However, these tools are basically unable to automatically extract data, and users need to use these tools to write appropriate conversion programs.
- For example, Oracle's Oracle Warehouse Builder (OWB) data extraction tool provides functions including: model construction and design, data extraction, movement and loading, metadata management, and so on. However, the process provided by OWB is cumbersome, difficult to maintain, and not easy to use.
- Among third-party products, Ascential Software's DataStage is a relatively complete set of products. DataStage can extract data from multiple different business systems and data sources from multiple platforms, complete conversion and cleaning, and load it into various systems, each of which can be completed in a graphical tool; it can also be flexibly used. External system scheduling provides specialized design tools to design conversion rules and cleaning rules, etc., and implements a variety of complex and practical functions such as incremental extraction and task scheduling. Among them, simple data conversion can be achieved by dragging and dropping on the interface and calling some DataStage predefined conversion functions. Complex conversion can be achieved by writing scripts or combining extensions of other languages. DataStage provides a debugging environment that can greatly improve development Efficiency of debugging extraction and conversion programs. [2]
Data migration preparation for data extraction and transformation
- Before data extraction, a large amount of preparation work is needed, which can be summarized into the following four parts.
- (1) For each data table in the target database, an extraction function is established according to the transformation processing description recorded in the mapping relationship. This mapping relationship is the result of the previous data difference analysis. The naming rule of the extraction function is: F_target data table name_E.
- (2) Optimize according to the SQL statement of the extraction function. The optimization methods that can be used are: adjusting parameter settings such as SORTAREA_SIZE and HASH_AREA_SIZE, starting parallel queries, using prompts to specify the optimizer, creating temporary tables, ANALYZES the source data table, and adding indexes.
- (3) Establish scheduling control table, including ETL function definition table (record names and parameters of extraction function, conversion function, cleaning function and loading function), extraction schedule table (recording extraction function to be scheduled), loading schedule table (recording to be scheduled Scheduled loading information), extraction log table (records the start time and end time of each extraction function schedule and the correct or error information of the extraction), load log table (records the start time and end time of each loading process schedule and the loading process Implementation of correct or wrong message).
- (4) Establish a scheduling control program, dynamically schedule the extraction function according to the extraction schedule, and save the extracted data into a flat file. The naming rules for flat files are: target data table name.txt.
- The data conversion work is mainly reflected in the cleaning of source data and the conversion of code data in the ETL process. Data cleaning is mainly used to clean the junk data in the source data, which can be divided into cleaning before extraction, cleaning during extraction, and cleaning after extraction. ETL mainly cleans the source data before extraction. The conversion of the code table can be considered before the extraction and during the extraction process, as follows.
- (1) For the data tables in the source database involved in ETL, based on the results of the data quality analysis, a cleaning function before data extraction is established. The cleaning function can be uniformly scheduled by the scheduling control program before data extraction, or it can be distributed to each extraction function for scheduling. The naming rule of the cleaning function is: F_source data table name_T_C.
- (2) For the data tables in the source database involved in the ETL, according to the results of the code data difference analysis, if the length of the code data values that need to be converted does not change or changes little, consider converting the code referenced in the source data table before extraction. . The conversion before extraction requires the establishment of a code conversion function. The code conversion function is uniformly scheduled by the scheduling control program before data extraction. The naming rule of the code conversion function is: F_source data table name_T_DM.
- (3) For codes with large differences in coding rules between old and new codes, consider converting during the extraction process. According to the results of the code data difference analysis, all extraction functions involving the code data are adjusted.
Data migration data migration verification
- After the data migration is complete, the migrated data needs to be verified. The verification after the data migration is a check on the quality of the migration. At the same time, the result of the data verification is also an important basis for judging whether the new system can be officially launched.
- You can verify the migrated data in the following two ways: compare and check the query data of the old and new systems, query the data of the same index through the respective query tools of the old and new systems, and compare the final query results; The data is restored to the state one day before the migration of the old system, and then all the business that occurred on the old system on the last day is re-recorded to the new system, checked for abnormalities, and compared with the old system to the final result.
- The quality analysis of the migrated data can be performed by data quality inspection tools or by writing targeted inspection procedures. The verification of the post-migration data is different from the quality analysis of the historical data before the migration, which is mainly to check the difference in indicators. The indicators for data verification after migration mainly include 5 aspects: integrity check, whether the referenced foreign key exists; consistency check, whether the values of the same meaning data in different positions are consistent; total score balance check, such as the sum of tax arrears indicators Compared with the total of different data from branches and households; check the number of records to check whether the number of records corresponding to the old and new databases is consistent; check the special sample data to check whether the same sample is consistent in the old and new databases.
Data migration method
- Data migration can be carried out in different ways. In summary, there are three main methods, that is, tool migration before system switching, manual entry before system switching, and new system generation after system switching.
- Migration (and relocation) is the process of moving files out of precious high-speed disk space and to secondary high-capacity master media discs. Files are still available when offline, but users need to access them over the network.
- This process is accomplished by maintaining a list of the names of archive files on the master media. When a user needs an archived file, they look for the file in that directory and open it like a normal file once it is found. The file is then migrated from the secondary storage (optical disk) to the main storage (disk). This process happens in the background, and the user may not know that the files have been migrated from the disc. When the user finishes processing, the file is moved back to the auxiliary storage. The migration process occurs immediately after a certain period of time or as the user or network administrator wishes.
- Novell NetWare High Capacity Storage System (HCSS) is a data archiving system that supports offline optical disk changer storage devices. The automatic disc changer is an automatic disc change device that can select a disc from a rewritable disc library. HCSS uses data migration technology to move files between high-speed, low-capacity storage devices (server's hard drives) and low-speed, high-capacity storage devices (disc libraries). Users can still use a list of files in a special directory, and these files look like they are stored online.
- The HCSS system "migrate" files marked by the administrator to the offline CD library storage device. If a user needs a migrated file, he only needs to access it in the usual way. The HCSS system migrates files back to disk and users can access them. With the exception of short access delays, users will not realize that they are accessing archive files. After some time, the files are migrated back to the disc again.
Data migration characteristics
Data migration direct mapping
- The original is what it is, and it is copied intact. For such a rule, if the length or precision of the data source field and the target field do not match, you need to pay special attention to see if you can directly map or do some simple operations.
Data migration field operations
- The target field obtained by performing mathematical operations on one or more fields of the data source. This rule is generally applicable to numeric fields.
Data migration reference transformation
- In the transformation, one or more fields of the data source are usually used as the key, and an associative array is searched for a specific value, and only a unique value should be obtained. The implementation of this associative array using the Hash algorithm is more appropriate and the most common. Before the start of the entire ETL, it is loaded into memory, which greatly helps improve performance.
Data migration string processing
- You can often get specific information from a string field in a data source, such as an ID number. Also, numeric values are often represented as strings. Operations on strings are usually type conversion, string interception, etc. However, because the randomness of the character type field also causes the hidden danger of dirty data, you must add exception handling when dealing with this rule.
Data migration null value judgment
- The handling of null values is a common problem in data warehouses. Is it treated as dirty data or as a specific dimension member? I'm afraid it depends on the application and needs to be further explored. In any case, do not use "direct mapping" for fields that may have NULL values
- data migration
Data migration date conversion
- In the data warehouse, the date value is generally specific, different from the representation method of the date type value, for example, the 8-bit integer 20040801 is used to represent the date. In the data source, these fields are basically date types, so for such rules, some common functions are needed to handle converting dates to 8-digit date values, 6-digit month values, and so on.
Data migration date calculation
- Based on dates, we usually calculate day, month, and duration. The date operation functions provided by general databases are all based on the date type, and if a specific type is used to represent the date in the data warehouse, it must have its own set of date operation functions.
Data migration aggregation operation
- For the measurement fields in the fact table, they are usually obtained by applying aggregate functions through one or more fields of the data source. These aggregate functions are in the SQL standard, including sum, count, avg, min, max.
Data migration established value
- This rule differs from the above types of rules in that it does not depend on the data source field, and takes a fixed or system-dependent value for the target field.
- Generally speaking, data migration is a technology that can store a large amount of infrequently accessed data on offline media such as tape libraries and disk libraries, and only save a small amount of frequently accessed data on the disk array. When data on media such as magnetic tape is accessed, the system automatically migrates the data back to the disk array. Similarly, data that has not been accessed for a long time in the disk array is automatically migrated to the tape media, thereby greatly reducing investment and management costs.