What Is Tetracaine?
Chi-square test is a very widely used hypothesis test method, and its application in statistical inference of classification data includes: Chi-square test for comparison of two rates or two constituent ratios; comparison of multiple rates or multiple constituent ratios Chi-square test and related analysis of classification data.
The chi-square test is the degree of deviation between the actual observed value and the theoretically inferred value of the statistical sample. The degree of deviation between the actual observed value and the theoretically inferred value determines the size of the chi-squared value. If the chi-squared value is greater, the deviation between the two The greater the degree; on the contrary, the smaller the deviation between the two; if the two values are completely equal, the chi-square value will be 0, indicating that the theoretical values are completely consistent.
Note: The chi-square test is for categorical variables. [1]
(1) Put forward the null hypothesis:
H 0 : The distribution function of the population X is F (x) .
If the population distribution is discrete, it is assumed that
H 0 : The distribution law of the population X is P {X = x i } = p i, i = 1, 2, ...
(2) Divide the value range of the population X into k disjoint intervals A1, A2, A3, ..., Ak, if it is desirable
A1 = (a0, a1], A2 = (a1, a2], ..., Ak = (ak-1, ak),
Among them, a0 can take -, ak can take + , the division of the interval depends on the specific situation, but the number of sample values in each small interval should be not less than 5, and the number of intervals k should not be too large or too small .
(3) Let fi be the number of sample values of Ai that fall into the i-th interval, and become the group frequency (true value) . The sum of all group frequencies f1 + f2 + ... + fk is equal to the sample capacity n.
(4) When H0 is true, according to the hypothesized overall theoretical distribution, the probability pi of the value of the population X falling into the i-th interval Ai can be calculated, so npi is the sample value that falls into the i-th interval Ai The theoretical frequency (theoretical value) .
(5) When H0 is true, the frequency fi / n of the sample values falling in the i-th interval Ai in n trials should be close to the probability pi. When H0 is not true, the fi / n and pi are very different. . Based on this idea, Pearson introduced the following test statistics Under the condition that the 0 hypothesis holds, it follows a chi-square distribution with k-1 degrees of freedom.
Independent Sample Four-Grid Table
1 degree of freedom
Suppose there are two categorical variables X and Y, and their value ranges are {x1, x2} and {y1, y2}, and their sample frequency contingency table is
| y1 | y2 | total |
x1 | a | b | a + b |
x2 | c | d | c + d |
total | a + c | b + d | a + b + c + d |
To infer that the discussion is H1: "X is related to Y", an independence test can be used to examine whether the two variables are related, and the reliability of this judgment can be given more accurately. The specific method is to calculate the test statistics from the data in the table. Value.
The larger the value, the more likely it is that "X and Y have a relationship".
| 0.50 | 0.40 | 0.25 | 0.15 | 0.10 |
k | 0.455 | 0.708 | 1.323 | 2.072 | 2.706 |
| 0.05 | 0.025 | 0.010 | 0.005 | 0.001 |
k | 3.841 | 5.024 | 6.635 | 7.879 | 10.828 |
When the data a, b, c, and d in the table are not less than 5, you can refer to the following table to determine the credibility of the conclusion "X has relationship with Y":
For example, when "X and Y are related" The value is 6.109. According to the table, since 5.024 <6.109 <6.635, the probability of "X and Y having a relationship" is between 1-0.01 and 1-0.025.
| male | Female | |
make up | 15 (55) | 95 (55) | 110 |
No makeup | 85 (45) | 5 (45) | 90 |
| 100 | 100 | 200 |
If there is no relationship between gender and makeup, the four grids should be numbers in parentheses (expected values, using maximum likelihood estimation 55 = 100 * 110/200, where 110/200 can be understood as the probability of makeup, multiplied by the number of men 100, to get the likelihood estimation of the man's makeup probability), which is different from the actual value (the number outside the brackets). The theoretical and actual gap indicates that this is not a random combination.
Applying the fitness formula = 129.3> 10.828
Significant correlation, the probability of this inference being established is p> 0.999, which is 99.9%.
Note: The fitness formula of the independent four-cell table can be written as n (ad-bc) ^ 2 / (a + b) (c + d) (a + c) (b + d)
Summary: independent four-grid table data test
The chi-square test of the data of the four tables is used to compare two rates or two constituent ratios.
1. Special formula:
If the frequencies of the four grids of the four-grid table data are a, b, c, d, the chi-square value of the chi-square test of the four-grid table data = n (ad-bc) ^ 2 / (a + b) (c + d) (a + c) (b + d), (or use the fitness formula)
Degrees of freedom v = (number of rows-1) (number of columns-1) = 1
2. Application conditions:
It is required that the sample content should be greater than 40 and the theoretical frequency in each grid should not be less than 5. When the sample content is greater than 40 but 1 = <theoretical frequency <5, the chi-square value needs to be corrected. When the sample content is less than 40 or the theoretical frequency is less than 1, the probability can only be calculated using the exact probability method.
(Degrees of freedom df = (C-1) (R-1))
The chi-square test of row × list data is used for comparison of multiple rates or multiple constituent ratios.
1. Special formula:
The chi-square value of the chi-square test for the data of the r line c list = n [(A11 / n1n1 + A12 / n1n2 + ... + Arc / nrnc) -1]
2. Application conditions:
The theoretical frequency T in each grid is required to be greater than 5 or the number of grids where 1 <T <5 does not exceed 1/5 of the total grid number. When there are many grids with T <1 or 1 <T <5, you can use parallel juxtaposition, delete rows and columns, and increase the sample content to make it meet the application conditions of the row × list data chi-square test. The pairwise comparison of multiple rates can be done by dividing the list into rows.
Contingency Table Data Inspection
Observe the performance of the two classification methods for each individual in the same group of objects. The result is a contingency table that forms a two-way cross-aligned statistical table. [2]
We often encounter such data in the statistical analysis of classified data. For example, the carcinogenesis rates of two groups of rats under the action of different carcinogens are shown in the following table. Is there a difference in the carcinogenesis rates between the two groups?
deal with | Number of cancers | No cancer | total | Cancer rate% |
Group A | 52 | 19 | 71 | 73.24 |
Group B | 39 | 3 | 42 | 92.86 |
total | 91 | twenty two | 113 | 80.53 |
This is the most basic data in the table, so the data in the above table is also called four-cell table data. The chi-square test statistic is a chi-square value, which is the cumulative sum of the ratio of the square of the difference between the actual frequency A and the theoretical frequency T of each grid to the theoretical frequency. The theoretical frequency T in each grid is calculated on the assumption that the cancer rates in the two groups are equal (both are equal to the cancer rates in the two combinations). 91/113) = 57.18, so the larger the chi-square value, the more obvious the difference between the actual frequency and the theoretical frequency, the greater the possibility of different cancer rates in the two groups.
Chi-square test requirements: Large sample data is preferred. In general, it is best to occur once in each case, and at least five times in a quarter of cases. If the data does not meet the requirements, a correction chi-square should be applied.
The analysis results using SAS (statistical software) are as follows:
data kafang;
input row column number @@;
cards;
1 1 52
1 2 19
2 1 39
2 2 3
;
run;
proc freq;
tables row * column / chisq;
weight number;
run;
Statistics | Degrees of freedom | value | Probability |
Bangla | 1 | 6.4777 | 0.0109 (significant) |
Likelihood ratio chi-square | 1 | 7.3101 | 0.0069 |
Continuous calibration chi-square | 1 | 5.2868 | 0.0215 |
Mantel-Haenszel Chi-Square | 1 | 6.4203 | 0.0113 |
Phi coefficient | | -0.2394 | |
Column Contacts | | 0.2328 | |
Cramer's V | | -0.2394 | |