What is e-value? (2024)

A standard practice for assessing primary hom*ology of a gene sequence is through local alignment (e.g., NCBI’s BLAST), in which an input sequence (the “query”) is matched against a particular sequence (the “hit”) in a database of sequences. The metric most commonly used to determine the significance of an alignment between a query and its hit is e-value. Given the ubiquity of this parameter, here is a short guide to explain what it is and how it’s used.

What is e-value?
E-value (short for expect value) is a calculation of the number of sequences in the database that are expected, by chance in a random search, to align equally or more significantly to the query than the hit that was found. It reflects the frequency that you will find an equal or better match in the database for your query sequence. In effect, e-value is an estimate of the hit to have been chosen due to random background noise.

See Also

ExBlast-Answers - 22111

What goes into calculating e-value?
E-value = K*m*n*e^(-λ*S)
K,λ =constants based on scoring matrix; m,n =lengths of the two sequences; S = alignment score, which is calculated based on the alignment produced (incorporating matches, mismatches, gaps, etc).

Therefore, e-value is mostly dependent upon the length of the sequences, the size of the database, and the derived alignment score. This implies that shorter sequences, especially with lower complexity, are less likely to be matched significantly (and are often filtered out). Moreover, e-values derived from searches across databases of different size cannot be compared. An e-value of 6e-32 from a search against a small database is less significant than an e-value of 6e-32 from a large database; as the database grows, the likelihood of the presence of a truly hom*ologous sequence grows in tandem, and consequently the likelihood of a false positive decreases.

Is an e-value the thing same as a p-value?
No. E-value is a frequency metric, whereas p-value is a probability metric. Though both metrics reflect the significance of the query-hit alignment, e-value represents the number of better alignments that are expected to occur by chance, while p-value represents the likelihood that the match in question occurred by chance. (In statistical terms, the e-value is a multiple testing correction of the p-value.)

NCBI uses e-value as its standard because it provides greater clarity and granularity; “it is easier to understand the difference between, for example, e-value of 5 and 10 than p-values of 0.993 and 0.99995.” Both can be used, but be aware of which you are using and why, because they represent different things.