What Is a Missing Market?
Missing values refer to clustering, grouping, censoring, or truncation of data due to lack of information in rough data. It refers to the incomplete value of one or some attributes in an existing data set.
- There are various reasons for missing values, which are mainly divided into mechanical reasons and human reasons.
- Mechanical reasons are data loss caused by failure of data collection or storage due to mechanical reasons, such as data storage failure, memory damage, and mechanical failure leading to a segment
- From the distribution of missing values, missing values can be divided into completely random missing, random missing and completely non-random missing.
- Missing completely at random (MCAR)
- This means that the missing data is random, and missing data does not depend on any incomplete or complete variables.
- Missing at random (MAR)
- It means that the lack of data is not completely random, that is, the absence of this type of data depends on other complete variables.
- Missing not at random (MNAR)
- It means that the lack of data depends on the incomplete variable itself.
- Missing values can be divided into attributes of missing values
- The treatment of missing values is generally divided into cases with missing values and missing values
- In the study of many practical issues, some data are unavailable or missing. When the missing ratio is small, you can directly process the complete records and discard the missing records. However, in actual data, missing data often accounts for a considerable proportion, especially multivariate data. At this time, the processing of deleting cases will be inefficient, because it will lose a lot of information, and will create bias, which will cause systematic differences between incomplete observation data and complete observation data.
- The imputation process just adds the unknown value to our subjective estimation value, which may not completely conform to objective facts. The above analysis is a theoretical analysis. Because the missing value cannot be observed by itself, it is impossible to know the type of the missing value, and it is impossible to estimate the interpolation effect of an interpolation method. In addition, these methods are commonly used in various fields, and they are universal, so the interpolation effect for a field of professionalism will not be ideal. It is for this reason that many professional data mining personnel manually understand the lack of information through their understanding of the industry. The effect of value interpolation may be better than these methods. Imputation of missing values is the case of artificially interfering with missing values in the process of data mining in order not to give up a lot of information. No matter which processing method will affect the relationship between variables, the incomplete information is supplemented. While processing, we have more or less changed the information system of the original data, which has a potential impact on subsequent analysis, so we must be careful about the treatment of missing values [3] .