What Is Diversification Analysis?
Multivariate analysis is a group of statistical theories and methods that study the relationship between multiple independent and dependent variables. Also called multivariate analysis. Multivariate analysis is the development and extension of univariate statistical methods.
- An important branch of statistics is to explore the inherent regularity of multidimensional data, such as the interdependence and structural relationship between multidimensional random variables. According to the type of data obtained, multivariate statistical analysis can be divided into continuous and discrete multivariate analysis. The former includes multivariate normal distribution estimation and testing, multivariate linear regression, discriminant analysis, canonical correlation analysis, principal component analysis, factors analysis,
- The first method involved in multivariate analysis was F. Galton, who in 1889 transformed bivariate
- Including 3 types: Multivariate analysis of variance, multiple regression analysis, and covariance analysis, called linear model methods, used to study the relationship between the determined independent and dependent variables; discriminant function analysis and cluster analysis, used to study Classification of things; principal component analysis, canonical correlation and factor analysis, research how to replace the original variables with a larger number of comprehensive factors.
Multivariate analysis
- It is a statistical method that divides the total variation into multiple parts according to its source (or experimental design) to test the influence of each factor on the dependent variable and the interaction between each factor. For example, when analyzing 2 × 2 factorial design data, the total variation can be divided into four parts, including the variation between two groups, the interaction between the two factors, and the error (that is, the variation within the group). The significance of variation and interaction between groups was tested by F test.
- Advantages: It is possible to test the influence of multiple factors with multiple levels on the dependent variable and the interaction between the factors in one study at the same time. The limitation of its application is that the sample at each level of each factor must be an independent random sample, the repeated observation data follow a normal distribution, and the overall variance is equal.
Multiple analysis multiple regression
- Statistical method used to evaluate and analyze the linear functional relationship between a dependent variable and multiple independent variables.
- Advantage: It can quantitatively describe the linear function relationship between a certain phenomenon and some factors. By substituting the known values of each variable into the regression equation, the estimated value (predicted value) of the dependent variable can be obtained, which can effectively predict the occurrence and development of a certain phenomenon. It can be used for both continuous and dichotomous variables (0, 1 regression). The application of multiple regression is strictly limited. First, use the analysis of variance method to test whether the linear regression relationship between the dependent variables y and m independent variables is significant. Second, if y and m independent variables have a linear relationship in general, it does not mean that all independent variables All variables have a linear relationship with the dependent variable. A t-test is also required for the partial regression coefficients of each independent variable to eliminate the independent variables that do not play a role in the equation. You can also use a stepwise regression method to establish a regression equation and gradually select the independent variables to ensure that the independent variables introduced into the equation are important.
Multivariate analysis covariance
- A statistical method that combines linear regression and analysis of variance to test whether there are differences between multiple modified means. For example, an experiment contains two multivariate independent variables, one is a discrete variable (with multiple levels) and one is a continuous variable. The purpose of the experiment is to analyze the pros and cons of each level of the discrete variable. This variable is the variance variable; and the continuous variable is Those who enter the experiment because they cannot be controlled are called covariates. In the application of covariance analysis, the linear regression function of the continuous variable and the dependent variable can be obtained first, and then the influence of the variable is subtracted according to this function, that is, the modified mean of the dependent variable when the continuous variable is equivalent, is obtained. Finally, the analysis of variance was used to test the significance of the differences between the modified means, that is, to test the effect of the discrete variables on the dependent variables.
- Advantages: The influence of discrete variables on the dependent variable can be tested under the influence of continuous variables, which helps to eliminate the interference of non-experimental factors. The limitation is that, in theory, the data (samples) of each group are required to come from a normal population with the same variance, and the linear regression coefficients of the population of each group are equal and not all zero. Therefore, the homogeneity test of variance and the hypothesis test of regression coefficients should be performed before applying the covariance analysis. If the above conditions are met or transformed, the covariance analysis can be performed.
Multivariate analysis discriminant function
- A statistical method to determine the category to which an individual belongs. The basic principle is: one or several linear discriminant functions and discriminant indicators are determined based on the observation data of two or more known categories of samples, and then the discriminant function is used to determine which category another individual belongs to according to the discriminant indicators.
- Discriminant analysis is used not only for continuous variables, but also for qualitative data with the help of quantitative theory. It helps to objectively determine the classification criteria. However, discriminant analysis can only be used when the category has been determined. When the category itself is untimed, the pre-use clustering analysis first classifies the category, and then performs discriminant analysis.
Multivariate analysis clustering
- A statistical method for solving classification problems. Clustering the observed objects is called Q-type analysis; if clustering the variables, it is called R-type analysis. The basic principle of clustering is to make the internal differences of the same kind smaller and the differences between the categories larger. There are two most commonly used clustering schemes. One is a systematic clustering method. For example, to divide n objects into k categories, first divide each n objects into one category, a total of n categories. Then calculate some kind of "distance" between the two, find out the two closest classes, and merge them into a new class. This process is then repeated step by step until it is of type k . The other is a stepwise clustering or dynamic clustering method. When the number of samples is large, the n samples are roughly divided into k categories, and then gradually modified according to some optimal principle until the classification is reasonable.
- Clustering analysis is based on the quantitative relationship between individuals or variables. Objectivity is strong, but various clustering methods can only achieve local optimum under certain conditions. Whether the final result of the clustering is true or not requires experts Identification. If necessary, you can compare several different methods and choose a classification result that meets professional requirements.
Principal Component Analysis
- A statistical method to turn the original multiple indicators into a few unrelated comprehensive indicators. For example, observing samples with p indicators, if p indicators are not related to each other, the problem can be reduced to p single indicators for processing. But most of the time there is a correlation between p indicators. At this time, the principal component analysis can be used to find the linear functions of these indicators that are independent of each other, so that the changes of the original multiple indicators can be explained by the changes of these linear functions. These linear functions are called the principal components of the original indicator, or principal components.
- Principal component analysis helps to identify the main factors that affect the dependent variable. It can also be applied to other multivariate analysis methods, such as regression analysis, discriminant analysis, and canonical correlation analysis of these principal components after the principal components are identified. Principal component analysis can also be used as the first step in factor analysis. Moving forward is factor analysis. The disadvantage is that it only involves the interdependence between a set of variables. If you want to discuss the interrelationship between two sets of variables, you need to use canonical correlation.
Canonical correlation
- A statistical method that firstly converts more variables into a few typical variables, and then comprehensively describes the relationship between two sets of multivariate random variables through the typical correlation coefficient between them. Let x be a p- ary random variable and y be a q- ary random variable. The correlation coefficients between p components of x and q components of y ( p × q ) can be calculated one by one, but this is both tedious and cannot reflect things. Nature. If canonical correlation analysis is used, the basic procedure is to draw a pair from each of the linear functions of the two sets of variables. They should be the pair with the highest correlation coefficient, called the first pair of canonical variables. You can find the second pair, the third pair, ... These paired variables are not correlated with each other, and the correlation coefficient of each pair of typical variables is called the typical correlation coefficient. The number of obtained typical correlation coefficients does not exceed the number of any one of the two original variables.
- Canonical correlation analysis helps to comprehensively describe the typical correlation between two groups of variables. The condition is that both sets of variables are continuous variables, and their data must obey a multivariate normal distribution.
- The above multiple analysis methods have their own advantages and limitations. Each method has specific assumptions, conditions, and data requirements, such as normality, linearity, and homoscedasticity. Therefore, when applying the multivariate analysis method, the theoretical framework should be determined at the research plan stage to decide what kind of data to collect, how to collect, and how to analyze the data.