What Is the False Discovery Rate?
FDR (false discovery rate) is a common term in statistics. It translates to false discovery rate, which means that the number of false rejections (rejecting true (original) hypotheses) accounts for the total number of rejected hypotheses. Expected ratio.
FDR
(FDR (false discovery rate))
- So far, Benjamini and Hochberg's articles have been cited tens of thousands of times, and FDR's theoretical and applied research is constantly maturing.
- The definition of FDR (false discovery rate) is as follows:
- Where E (·) is the mathematical expectation. Similarly, we can get the definition of false negative discovery rate (FNDR):
- The meaning of FDR is the expected value of the proportion of false rejections (rejecting true (original) hypotheses) to the number of all rejected hypotheses. FDR has the following advantages: (1) its value can be adjusted flexibly as a control index of the hypothesis test error rate, and its control value can be flexibly selected according to needs, while the value of the traditional hypothesis test (FWER) is relatively fixed and usually fixed 0.05; (2) The significance of FDR is clear, and it can be used as an evaluation index for the difference variables that are screened out, while FWER is mainly used to control Class I errors.
- The relationship between FDR and FWER: Controlling FDR is equivalent to controlling FWER when all invalid assumptions are true;
- When m0 <m (m0 is the number of true indifference variables), controlling FDR is equivalent to weakly controlling FWER.
- Control refers to determining the threshold of a significance level, so that FDR is limited to a fixed level. Similar to FWER control, a linear upward control method can be used for this. It is performed in two steps: First, all p Values are sorted, i.e., p (1) p (2) p (3) ... i p (m); then step backwards and compare p (i) q (i = m, m-1, m -2, ..., 1), take the first p (k) (k 1) that satisfies the condition. In theory, it can be proved that the FDR can be controlled at q (0 q 1). . The above method needs to satisfy the condition that each variable hypothesis test is independent. On this basis, Yekutieli and Benjamini gave an improved method in 1999, but the estimated FDR value is slightly conservative. The idea is to use repeated sampling to calculate the p value, and the FDR value can be controlled under variable correlation conditions. . In the same year, Benjamini and Liu proposed a step-down control method. The process is similar to the BH basic method, except that the control method for p (k) is different. In 2000, Benjamini and Hochberg proposed a two-stage FDR control to improve the conservativeness of the original method.In 2001, Benjamini and Yekutieli further improved the algorithm, which can be used for FDR under different conditions with independent and relevant correlation between different variable tests The disadvantage of control is that its inspection efficiency is low. Benjamini and Hochberg proposed an adaptive linear step-up control (ALSU) in 2005.This method is characterized by using the above-mentioned basic process twice under different significant levels. In particular, the FDR estimates obtained under variable correlation conditions are relatively robust.