How Do I Maintain Hard Drive Health?
SMART stands for "Self-Monitoring Analysis and Reporting Technology", which means "Self-Monitoring Analysis and Reporting Technology". It is an automatic hard disk condition detection and early warning system and specification. Through the detection instructions in the hard disk hardware, the operating conditions of the hard disk hardware such as magnetic heads, disks, motors, and circuits are monitored, recorded, and compared with the preset safety values set by the manufacturer. By setting the security range of the security value, users can be automatically warned and slightly repaired by the host's monitoring hardware or software to ensure the security of hard disk data in advance. With the exception of some hard drives that are shipped very early, most hard drives are now equipped with this technology.
Right!
- SMART (Self-Monitoring, Analysis, and Reporting Technology): This is a data security technology commonly used in hard disks. When the hard disk is working, the monitoring system analyzes the state of the motor, circuit, disk, and magnetic head. When an abnormality occurs, it will Issue warnings, and some will automatically slow down and back up data.
- As early as the 1990s, people realized that the value of data is better than the value of the hard disk itself. They longed for a technology that can predict the failure of the hard disk and achieve relatively secure data protection. Therefore, SMART technology came into being. At present, the average time between failures (MTBF) of most hard disks is generally more than 30,000 hours, and some high-end products can reach 1.2 million hours. But for many users, especially business users, a common hard disk failure is enough to cause catastrophic results, so today, SMART technology is still used by us.
- This technology was first developed by Compaq. Hard disk manufacturers such as IBM, Seagate, Fujitsu, and Quantum participated in the revision.
- And integrates Compaq's IntelliSafe diagnostic technology and IBM's PFA detection technology features.
- In May 1995, Compaq submitted the IntelliSafe Technical Standard Report (SFF-8035i) to the Small Form Factor (SFF) Committee; in January 1996, it revised the 1.0 version (SFF-8035R2); in June 1996, it performed 1.3 Version of the amendment (SFF-8055), and formally applied to SFF in conjunction with IBM and other companies to add IntelliSafe technology to the ATA-3 industry standard, officially renamed SMART
- As an industry standard, SMART specifies the standards that drive manufacturers should follow. The conditions for meeting the SMART standard mainly include: completing the setting of various parameters and attributes required by SMART during the manufacturing of the device; under normal system platforms, SMART can be used normally; through the BIOS test, it can identify whether the device supports SMART and can display related information. Information, and can identify the valid and invalid SMART information; allow users to freely turn on and off the SMART function; during the user's use, can provide various effective information of SMART, determine the working status of the device, and can issue corresponding correction instructions or caveat. In the case that the hard disk and the operating system support SMART technology and the technology is enabled by default, SMART technology can display an English warning message on the screen when a bad condition occurs: "WARNING: IMMEDIATLY BACKUP YOUR DATA AND REPLACE YOUR HARD DISK DRIVE, A FAILURE MAY BE IMMINENT. "(Warning: Back up your data immediately and replace the hard drive. There may be errors.)
- SMART information is kept in the system service area of the hard disk. This area is generally located in the first dozens of physical tracks on the physical side of hard disk 0. The manufacturer writes the relevant internal management program. In addition to the SMART information table, it also includes low-level formatting procedures, encryption and decryption procedures, self-monitoring procedures, and automatic repair procedures. The monitoring software reads SMART information through a command called "SMART RETURN STATUS" (command code: B0h), and does not allow end users to modify the information.
- In the SMART standard, binary code is used as the basic SMART instruction, and it is required to write in the standard register to form a specific SMART information table for normal detection and operation. SMART instructions are divided into primary commands (Command) and secondary commands (Subcommands). The main instruction mainly provides information on whether the device supports SMART or ignores the characteristics of a certain instruction. The sub-command provides detection information for SMART devices. These instructions are mainly written by equipment manufacturers, and some professional hard disk repair software can perform equipment inspection through these codes.
- The principle of SMART technology is to detect various attributes of the hard disk, such as data throughput performance, motor startup time,
- Comparison and analysis of attribute values such as seek error rate and standard values, inferring the failure condition of the hard disk and giving prompt information to help users avoid data loss. SMART therefore specifies special testing parameters. Due to differences in hard disk structure, performance, and positioning, in addition to the parameters specified in the ATA-3 standard, manufacturers can provide different SMART testing parameters according to the characteristics of their products. Ordinary users can use common system tools (such as AIDA32) to view and use these parameters to understand the "health" of the hard disk.
- ID detection codes are not unique. Manufacturers can use different ID codes or increase or decrease the number of ID codes according to the number of detection parameters. For example, Western Digital s product ID detection code is 04, and the detection parameter is Start / Stop Count (number of power-on), while Fujitsu s detection code with the same code is Number of times the spindle motor is activated (motor Activation time).
- (Attribute Description)
- Attribute Description is the name of the detection item. It can be added or subtracted by the manufacturer. As the ATA standard is constantly updated, sometimes different models of the same brand may be different. However, it is necessary to ensure that several major testing items stipulated by SMART (although different manufacturers have specific naming rules for testing items, the essence of these monitoring items is actually the same).
- Read Error Rate
- Start / Stop Count (also called number of power-on)
- Relocated Sector Count
- Spin up Retry Count
- Drive Calibration Retry Count
- ULTRA DMA CRC Error Rate
- Multi-zone Error Rate
- Vendor-specific vendor characteristics
- It should be noted that the attribute descriptions of different manufacturers and different types of products are different. For users, there is no need to understand their specific meanings, they only need to understand the meaning of the attribute monitoring values.
- (Threshold)
- Also called threshold. It is a reliable attribute value specified by the hard disk manufacturer and calculated by a specific formula. If there is an attribute value below the corresponding threshold value, it means that the hard disk will become unreliable, and the data stored in the hard disk will be easily lost. The composition and size of reliable attribute values are different for different hard disks. It should be noted that only some SMART parameters are specified in the ATA standard. It does not specify specific values. The "Threshold" value is determined by the manufacturer according to its own product characteristics. Therefore, the detection software provided by the manufacturer often differs greatly from the detection results of the detection software under Windows (such as AIDA32). Here, we recommend using the detection result of the manufacturer's software as the standard, because under Windows environment, the system requires much more hard disk startup programs than under DOS, which may cause the SMART value of the hard disk to fluctuate more than that detected under DOS. Take the parameter Raw ErrorRate (error read rate) as an example: The calculation formula of this parameter is 10 × log10 (the number of sectors of data transmitted between the host and the hard disk) × 512 × 8 / reread sectors. 512 × 8 is used to convert the number of sectors into the transmitted data bits. This value is only calculated when the transmitted data bits are in the range of 10 ^ 10 10 ^ 12, and when the Windows system is started Later, when the data sector transferred between the host and the hard disk is greater than or equal to 10 ^ 12, this value will be reset again. This is why some values fluctuate greatly under different operating environments and different detection procedures.
- (Attribute value)
- The attribute value refers to the maximum normal value preset when the hard disk is delivered from the factory, and generally ranges from 1 to 253. Usually, the largest attribute value is equal to 100 (for IBM, Quantum, Fujitsu) or 253 (for Samsung). Of course, there are exceptions. For example, some models of hard disks produced by Western Digital use two different attribute values. The initial attribute value is set to 200, but the attribute value of the hard disk produced later is changed to 100.
- (Worst)
- The maximum error value is the largest abnormal value that has ever occurred during the operation of the hard disk. It is a calculated value for the cumulative operation of the hard disk. According to the operating cycle, this value will be continuously refreshed and will be very close to the threshold. SMART analyzes and determines whether the status of the hard disk is normal, based on the comparison between this value and the threshold. New hard drives start with the largest attribute value, but this value will decrease as daily use or errors occur. Therefore, a larger attribute value means that the hard disk has better quality and higher reliability, while a smaller attribute value means that the probability of a failure increases.
- (Date)
- It is the actual value of each test item running on the hard disk, and many items are cumulative values. For example: Start / Stop Count in Figure 3, the cumulative actual value is 436, which means that the hard disk has been powered on and off 436 times since the beginning.
- (Status)
- This is the current status of each hard disk attribute provided by SMART after a comparative analysis of the previous attribute values. It is also important information for us to intuitively judge the "health" status of the hard disk. According to SMART regulations, this state generally has three states: normal, warning, and reporting a fault or error. SMART determined that these three states are closely related to the assignment of SMART's Pre-failure / advisory BIT parameters.
- When Pre-failure / advisory BIT = 0 and the reliable attribute value is much larger than the threshold, the OK flag will be displayed normally. When Pre-failure / advisory BITt = 0 and the reliable attribute value is greater than the threshold but close to the threshold critical value, a warning "!" Flag is displayed; when Pre-failure / advisory BITt = 1 and the reliable attribute value is less than the threshold, Report a fault or error with a "!!!" sign.
- In Figure 2, we find that under the normal state where the "OK" sign appears, there are two state descriptions, "Value is Normal" and "Always Passing." The difference between them is: "Normal value" indicates that the SMART value is normal, and the hard disk is not faulty; "Skip forever" indicates that this item is only a record of a parameter, and there are no qualified and unqualified standards, such as "Power on time "count", this parameter only records the time that the hard disk has been powered on. This parameter should always be qualified. It is not used to measure the performance of the hard disk, so it is displayed as "OK: Value is Normal".
- Let s take the Start / Stop Count (number of power-on) detection parameter with the ID 04 as an example to fully understand the meaning of these 7 columns of parameters: From Figure 2 we see that the attribute's normal value (Attribute value) is "100", this normal value is given by the calculation formula: "100-the number of power-ups during the normal service life of the hard disk / 1024". The maximum error value is the cumulative calculated value of the hard disk operation. For example, if it is a new hard disk, the number of power-on times is 0, so it is 100-0 / 1024 = 100, and the maximum error value = the normal value of the attribute. As the number of power-ups increases, the maximum error value continuously changes. The threshold value specified by the manufacturer is 20, that is, when the number of times the hard disk is switched on and off reaches 81920 times (100-81920 / 1024 = 20), and the maximum error value = threshold value, the system will prompt the user to back up the data. Therefore, in the state where the number of power-ups is within the range of 81920 times and the maximum error value is always greater than the threshold 20, it is normal. The number of power-ups (actual value of Date) in the figure is 107 times, so the maximum error value is approximately 100, and the status is displayed as "OK: Value is Normal". It is important to note that the values given for each parameter are given through some specific calculation formulas. As a user, as long as you observe the relationship between "Worst" and "Threshold" values, and pay attention to the status prompt attribute status information, you can roughly understand the health status of the hard disk.
- Implementation on non-ATA platforms
- Since there are two standards of ATA and SCSI in the hard disk field at present, it is undeniable that SMART technology supports these two series of products at the same time, but there are some differences in some parameter settings. SCSI is more important than ATA hard disks complex. However, in actual operation, due to different users and use environments, SMART has more intervention in the ATA / IDE system than SCSI system, and the judgment of SCSI failure is more professional and accurate. The SMART technology of the SCSI hard disk is more complicated than the SMART technology of the ATA hard disk. The following only lists some of the parameters unique to the SCSI hard disk.
- PrimaryTemp: working temperature of the hard disk body
- Secondary Temp: working temperature around the PCB
- Min and Max Temp: the maximum and minimum working temperature of the hard disk body in a period of time
- Velocity Observer Count: the number of times the servo track has deviated from the specified track during a period of time
- 12V: 12V supply voltage value
- 5V: 5V supply voltage value
- MR Res: resistance value of MR head
- Sectors Read: Number of sectors read from the hard disk over a period of time
- Sectors Written: Number of sectors in which data is written to the hard disk over a period of time
- Under the ATA / IDE environment, the software on the host computer interprets the alarm signal from the hard disk generated by the SMART "Report Status" command.
- In the USB standard, USB cannot be used for the basic bus of storage devices inside the computer (such as ATA, SCSI, etc.), and it does not provide a way for SMART to transmit data. In mobile hard disks that use ATA hard disks and USB as the transmission port, even though SMART is still operating in the hard disk, there is no way to directly provide SMART data to the system. Now the internal drive conversion circuit of the new mobile hard disk has been able to transfer the SMART data in the hard disk to the system or monitoring program via USB in some ways.