How BLAST E-values are calculated and what they mean (2024)

A crucial measure that accompanies every hit sequence that BLAST identifies is the E-value, from Expectation value. (aka, E value, e-value, evalue). Here, we’ll walk through:

  • What is an E-value?
  • How it is calculated?
  • How to interpret it?
  • How to get the most power and sensitivity from your BLAST analysis?

How BLAST E-values are calculated and what they mean (1)

SequenceServer BLAST result highlighting that E-values are shown in two places in the BLAST result report: in the table of all hits, and as part of the alignment of each hit.

What is an E-value?

The BLAST E-value is:

  • Not a p-value.
  • Not the exact number of times a sequence was found due to chance.

Instead, it is an estimate of the expected number of random alignments with a particular score or better that could be found by chance in a given database search. In other words, it represents the likelihood that a specific sequence alignment is due to chance rather than a true biological relationship between the sequences.

Interpreting E-values

The E-value describes the number of hits we expect to see by chance when BLASTing a database. It helps us understand if our hits are relatively unique or not. For example, an E-value of 1 means that one expects by chance to see 1 match with a similar score. We need to be careful with interpreting E-values and need to consider the biological question and datasets. This includes the context of the specific research question and alongside other factors like alignment length, sequence identity, and biological context or question. However, in general:

  • Lower (i.e., stronger) E-values indicate more “significant” alignments, suggesting a higher probability that the sequences share a common evolutionary origin.
  • Higher (i.e., weaker) E-value indicates that the alignment might be a random event.

E-values are not fixed thresholds for determining the significance of an alignment. Always consider the biological context and the datasets used.

In many cases, BLAST analysis is just a first step. In particular, a stronger E-value does not necessarily imply a stronger evolutionary relationship.

Interpreting it like that is a common mistake! To understand relationships across sequences, you should typically also perform multiple sequence alignment followed by phylogenetic reconstruction. Additional evidence also helps (e.g., understanding sequence conservation and domain architecture).

How is the BLAST E-value calculated?

The E-value is calculated based on the alignment score (S), the search space size (m × n), and the parameters derived from the scoring system and the database composition, such as the Karlin-Altschul parameters (K and λ). The formula for E-value is:

E-value = K × m × n × e-λS

Where:

  • m is the length of the query sequence.
  • n is the length of the database (i.e., the sum of all the lengths of all the sequences in the database).
  • K and λ are the Karlin-Altschul parameters. They can be estimated from large sets of random sequence alignments. The λ parameter normalizes the alignment score, while the K parameter scales the E-value based on the database and sequence lengths.
  • S is the alignment score. It is calculated based on the selected scoring matrix and the given sequence alignment. The score reflects the sum of substitution and gap scores for the aligned residues.

The E-value thus depends on the database size. Larger databases have more chances of producing the alignment you see by chance… so E-values for the same amount of similarity end up being weaker (higher).

So how should I tweak my BLAST analysis to get the most power?

  1. Use the appropriate database. If you’re looking for a particular gene in humans… only BLAST against the human genome… not against a database that is orders of magnitude greater. Doing so would make it less likely for you to get strong E-values, even if the gene is present in the human genome. And the BLAST analysis would also take much longer.
  2. Use the appropriate BLAST algorithm for your biological question and evolutionary distance. Consider that nucleotides diverge faster than protein sequences. So:
    • if you’re comparing highly similar sequences (e.g., to help identify intron-exon boundaries, or allelic differences), use BLASTN.
    • if you’re identifying orthologs across species, use BLASTP. To be certain that a gene is absent from a species, use TBLASTN.
  3. Use an appropriate scoring matrix. BLOSUM62 is used by default. But for longer evolutionary timescales, the PAM250 is more appropriate.
  4. Investigate different E-value thresholds to see the impact on the resulting hits.

Aren’t these kinds of adjustments “E-value hacking”?

No. If done appropriately it’s just using the right tool for the job. In fact, we need to consider all of the above to make sure the E-value is useful for our biological questions.

Stay up to date

To receive the latest news from our team, enter your email:

How BLAST E-values are calculated and what they mean (2024)
Top Articles
How to collect ESG data: our best practices
How much annual income do you need to afford a rental? Much more than before, report says
Dainty Rascal Io
Moon Stone Pokemon Heart Gold
122242843 Routing Number BANK OF THE WEST CA - Wise
Shoe Game Lit Svg
Login Page
Robinhood Turbotax Discount 2023
Shaniki Hernandez Cam
Irving Hac
Zachary Zulock Linkedin
shopping.drugsourceinc.com/imperial | Imperial Health TX AZ
ATV Blue Book - Values & Used Prices
No Strings Attached 123Movies
Bowlero (BOWL) Earnings Date and Reports 2024
Walmart End Table Lamps
Morgan And Nay Funeral Home Obituaries
Lesson 8 Skills Practice Solve Two-Step Inequalities Answer Key
Minecraft Jar Google Drive
2 Corinthians 6 Nlt
2020 Military Pay Charts – Officer & Enlisted Pay Scales (3.1% Raise)
623-250-6295
Wics News Springfield Il
Greenville Sc Greyhound
Johnnie Walker Double Black Costco
Nesb Routing Number
Restored Republic June 16 2023
Panolian Batesville Ms Obituaries 2022
Ipcam Telegram Group
Till The End Of The Moon Ep 13 Eng Sub
Nurtsug
County Cricket Championship, day one - scores, radio commentary & live text
James Ingram | Biography, Songs, Hits, & Cause of Death
2430 Research Parkway
Martin Village Stm 16 & Imax
No Hard Feelings Showtimes Near Tilton Square Theatre
Closest 24 Hour Walmart
SOC 100 ONL Syllabus
Hebrew Bible: Torah, Prophets and Writings | My Jewish Learning
Gpa Calculator Georgia Tech
Toth Boer Goats
968 woorden beginnen met kruis
6576771660
Blue Beetle Showtimes Near Regal Evergreen Parkway & Rpx
Greg Steube Height
N33.Ultipro
Bank Of America Appointments Near Me
Evil Dead Rise - Everything You Need To Know
Cvs Minute Clinic Women's Services
2487872771
Autozone Battery Hold Down
Noelleleyva Leaks
Latest Posts
Article information

Author: Eusebia Nader

Last Updated:

Views: 5963

Rating: 5 / 5 (60 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Eusebia Nader

Birthday: 1994-11-11

Address: Apt. 721 977 Ebert Meadows, Jereville, GA 73618-6603

Phone: +2316203969400

Job: International Farming Consultant

Hobby: Reading, Photography, Shooting, Singing, Magic, Kayaking, Mushroom hunting

Introduction: My name is Eusebia Nader, I am a encouraging, brainy, lively, nice, famous, healthy, clever person who loves writing and wants to share my knowledge and understanding with you.