Frequently Asked Questions — BLASTHelp documentation (2024)

BLAST

>

blast-help

>

Frequently Asked Questions

You are seeing the result of automatic filtering of your query for low-complexity sequence. This filter prevents matches that are probably artifacts. The filter substitutes any low-complexity sequence with lowercase grey characters in the results, which allows you to see the sequence that was filtered.

You can turn off the filter before submitting your search; see the checkbox in the “Algorithm parameters” section. However, turning off the filter could lead to a failed search due to excessive CPU usage.

Use the Primer-BLAST tool to search with pair of primers.You can enter the forward and reverse primers in the primer input boxes on the form. Select the appropriate databaseand a taxonomic group (organism) in the ‘Primer Pair Specificity Checking Parameters’ section of the formand click the ‘Get Primers’ button. The results will show you what sequences in the database match both primersand the lengths of potential products. For other short sequences you can use nucleotide BLAST in the usual way. Simply paste or type your sequences in the query box, select the appropriate database and click the BLAST button. The BLAST parameters will automatically adjust to find matches to short sequences.

To search only sequences for an organism or taxonomic group, use the “Organism” text box. Begin to enter a common name (e.g., rat, bacteria), a genus or species name, or an NCBI taxonomy id (e.g., 9606); then select a name from the list.

You can also exclude taxonomic groups with the “exclude” checkbox to the right of the “Organism” box.

Additional taxonomic groups can be included or excluded with the “Add organism” button.

You can search for taxa in the Taxonomy Browser.

Look at the “Choose Search Set” section of a search form, locate the Exclude line, check the checkboxes to the right to exclude those sequences from your search.

For protein databases, you can use the following options:

  • limit to a group of organisms through the Organims option

  • exclude Models (XM/XP), Non-redundant RefSeq proteins (WP), and Uncultured/environmental sample sequences through provided checkboxes

For non-WGS/TSA nucleotide database, you can use the following options:

  • limit to a group of organisms through the Organims option

  • exclude Models (XM/XP) and Uncultured/environmental sample sequences through provided checkboxes

  • limit by Entrez Query through custom search terms. See this video (Wayne’s YT video) for full details.Get help with writing Entrez queries in the NCBI Handbook (Anything more specific to link?).

Once you are satisfied with the parameters for a particular search, you can bookmark that page for future use.The “Bookmark” button is near the top right of the search page. If logged into your NCBI account,you can save that search settings using the “Save Search” link at the top left of a search result page.To access your previously saved search strategies, click the “Saved Strategies” link in the upper right of any BLAST page.

The NCBI cannot provide compute resources for large-scale batch BLAST searches from individual users on the web service.For batch BLAST searches you can set up standalone BLAST to run against local databases or with th the remote option to run against databases at NCBI.

  1. Standalone BLAST programs installed on a local computer or on a cloud service. The BLAST programs are command line programsthat run BLAST searches against local, downloaded copies of the NCBI BLAST databases, or against custom databases formatted for BLAST. The programs can handle either a single large file with multiple FASTA query sequences, or you can create a script to send multiple files one at a time. The executables are available for a wide variety of platforms, including LINUX, Windows, and Mac OSX. You can install these locally or on a cloud provider. In a cloud setting you may want to use ElasticBLAST package ( that can automatically allocate cloud resources according to the scale of the BLAST search.

    You can download the standalone package from https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/.The user manual is available from https://www.ncbi.nlm.nih.gov/books/NBK279690/.

  2. Network BLAST client. The stand-alone executables can send searches to the BLAST server using the -remote flag. See the BLAST manual for details. This client uses NCBI compute resources and is considered a batch search. Searches will be run at lower priority than interactive searches from the NCBI BLAST web pages. Searches run at off-peak hours may have better throughput. Projects involving many searches should be run with stand-alone BLAST against locally installed databases or through an instance at a cloud provider.

On the BLAST search pages at the bottom of the “Enter Query Sequence” section is a checkbox titled Align two or more sequences. When you check this box, the search form will change to include a new section, “Enter Subject Sequence”. By entering sequences in the Subject field, and then clicking the BLAST button, you will compare the Query sequence(s) to the sequences you enter.The subject sequences essentially become a custom database.

The Expect value (E) is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases. Essentially, the E value describes the random background noise. For example, an E value of 1 assigned to an alignment means that in a database of the same size one expects to see 1 match with a similar score, or higher, simply by chance.

The lower the E-value the more “significant” the match is. However, keep in mind that virtually identical short alignments have relatively high E values. This is because the calculation of the E value takes into account the length of the query sequence. These high E values make sense because shorter sequences have a higher probability of occurring in the database purely by chance. For more details please see the calculations in the BLAST Course.

The Expect value can also be used as a convenient way to create a significance threshold for reporting results. You can change the Expect value threshold on most BLAST search pages. When the Expect value is increased from the default value of 0.05, a larger list with more low-scoring hits can be reported.

Regions with low-complexity sequence have an unusual composition that can create problems in sequence similarity searching. For amino acid queries this compositional bias is determined by the SEG program (Wootton and Federhen, 1996). For nucleotide queries it is determined by the DustMasker program (Morgulis, et al., 2006).

Low-complexity sequence can often be recognized by visual inspection. For example, the protein sequence PPCDPPPPPKDKKKKDDGPP has low complexity and so does the nucleotide sequence AAATAAAAAAAATAAAAAAT. Filters are used to remove low-complexity sequence because it can cause artefactual hits.

In BLAST searches performed without a filter, high scoring hits may be reported only because of the presence of a low-complexity region. Most often, it is inappropriate to consider this type of match as the result of shared hom*ology. Rather, it is as if the low-complexity region is “sticky” and is pulling out many sequences that are not truly related.

On the “blastn” (nucleotide-nucleotide) page there is an option to filter “Species-specific” repeats for a number of common organisms.This may be especially important if your query matches to the same or a related organism many times. To enable this, go to the “Algorithm parameters”section (at the bottom of the page), check “Species-specific repeats”, and choose the proper organism.

The following non-standard parameters will need to be changed to mimic webBLAST:

For blastn

-evalue 0.05 -max_target_seqs 100

For blastp

-evalue 0.05 -max_target_seqs 100 -seg yes

For a full list of the default parameters in a standalone BLAST+ search please visit our BLAST+ manual.

ClusteredNR is a database of clusters of similar proteins generated from the standard protein nr database with MMseqs2.Searching against ClusteredNR is faster, provides greater taxonomic reach, and easier to interpret results thanthe traditional nr database. Each cluster contains proteins that are more than 90% identical to each other and within90% of the length of the longest member. We select a single well-annotated protein that indicates the functionof the proteins in the cluster as the lead or representative protein. The title of the representative protein isthe title that shows in the BLAST results. Each cluster may contain sequences for multiple organisms (species).On the BLAST results, clusters are identified by the name of the organism for the title protein as well as the mostrecent common ancestor taxon for all organisms in the cluster. This makes it clear when the cluster includes multiplespecies. You can expand a cluster on your BLAST results to view and download a report or the sequences of all memberproteins, and you can also perform a BLAST alignment of all the members of the cluster.

If you have submitted a sequence to GenBank and cannot find it in the “nt” databases nor find it’s protein translation in the “nr” database there are two reasons.

Make sure your sequence accessions where released by NCBI into the databases if they have been published. You can do this through the submission portal or contact [email protected].

The most common reason specific accession numbers cannot be found in BLAST searches is because the databases are redundant and your sequences is identical to one or more sequences. The “nt” and “nr” databases are non-redundant meaning that identical sequences are combined into a single entry with a single representative as the title for the entry. In web BLAST if you go to the alignments between your query and the database match you will see a hyperlink under the title of the subject sequences indicting up to 5 additional identical sequences. To see all these sequences you can click the link “See all Identical Proteins(IPG)”.

Any database available on the webBLAST input form is available for use with the BLAST+ “-remote” option.To get the correct path to each database, select the database you want from the drop-down list on webBLAST thenclick the “Bookmark” button in the upper-right corner of the screen. On the next page, examine the URL and findthe section “DATABASE=<path>” where “<path>” is the name of the database you should use in your BLAST+ command.For example, the Betacoronavirus Genbank database path is “DATABASE=genomic/Viruses/Betacoronavirus” sothe correct path to the same database with remote BLAST+ would be “-db genomic/Viruses/Betacoronavirus”.

The “No significant similarly found” message means that your query did not match any sequences in the database with thecurrent search parameters. Using the default setting for most BLAST searches, this generally means that your queryis not closely related to sequences in the database. This does not mean there may not be small regions of similarity betweenyour query and the database. In order to match these regions you may try switching from MegabBLAST to blastn in the case ofnucleotides, or lower the word size and increase the expect value for blastp. However, keep in mind that the more youchange these parameters the more you decrease the specificity of your match.

In general this message means that the program cannot recognize the query sequences in the “Enter Query Sequence” field.Blast accepts sequences in FASTA format either with a definition line proceeded by a “>” symbol, or raw sequence.BLAST can also accept sequence data that has been cut and pasted form GenBank or GenPept format, which has positionnumbers at the beginning or end of each line. You may also enter an NCBI accession or GI number.

BLAST cannot recognize, gene names or symbols, protein names, E.C. numbers or any non-sequence data.

Finally, if your query contains a lot of low complexity sequence and the filtering option for “Low complexity regions”is selected, it is possible for too much of the query sequence to be filtered out. You can deselect the filter under “Advanced parameters”.

Frequently Asked Questions — BLASTHelp  documentation (2024)

FAQs

How to interpret blastn results? ›

BLAST outputs a standard list of the most important metrics. These are the Score, E-value, Identity, Gaps and Strand. Score: BLAST calculates the alignment score based on the number of matches, mismatches, and gaps. The higher the score, the more sequence similarity between the query and subject.

What is the max score in BLAST? ›

Max[imum] Score: the highest alignment score calculated from the sum of the rewards for matched nucleotides and penalities for mismatches and gaps.

What does an e value of 0 mean in BLAST? ›

The E-value is the expectation value that indicates the number of alignments with a score≥S that one can expect to find by chance in a database of size N. Hence, the E-value is dependent on the database size and the query length. The closer the E-value to 0, the better is the alignment.

What is the expect threshold in BLAST? ›

The Expect threshold ("E") is a BLAST parameter that reflects the number of matches expected to be found by chance. If the statistical significance of a match is greater than the Expect threshold, the match will not be reported. The E threshold default is set to 10.

What is a good BLAST score? ›

Blast results are sorted by E-value by default (best hit in first line). The smaller the E-value, the better the match. Blast hits with an E-value smaller than 1e -50 includes database matches of very high quality. Blast hits with E-value smaller than 0.01 can still be considered as good hit for hom*ology matches.

What is a good percentage identity in BLAST? ›

If there are a large number of reference sequences that fall into the 98-100% range in your results, all with the same species name you believed it your specimen to be, then you would likely identify your specimen as that species, and not need to review that sequence much more in the future.

How do you get more than 100 results in BLAST? ›

All Answers (6)
  1. Go to NCBI Blast page,
  2. Bottom of this page go to the option +Algorythm Parameter and modify the default value (100) to According to your requirement(1000, 2000, 5000)
  3. Select the value.
  4. Got the Maximum hits which you select in the option.
Jan 18, 2022

What is the difference between bit score and e-value in BLAST? ›

Bit scores are normalized, which means that the bit scores from different alignments can be compared, even if different scoring matrices have been used. The E-value gives an indication of the statistical significance of a given pairwise alignment and reflects the size of the database and the scoring system used.

What is the p value in BLAST? ›

The E-value describes how many hits you can expect to see by chance when searching a database of a certain size, whereas the P-value describes the probability that the alignment you are observing is due to chance. In general, the lower the E- or P-value is, the more likely it is that an alignment is significant.

What is a bad e value? ›

10e-10 < E-value < 1 Could be a true hom*ologue but it is a gray area. E-value > 1 Proteins are most likely not related. E-value > 10 Hits are most likely junk unless the query sequence is very short.

What is a hit in BLAST? ›

In the terminology used by BLAST, these are the query sequences. A sequence search will (hopefully) identify sequences that are similar (or even identical) to the queries. The identified sequences are often called the hit sequences (or just hits).

How is the BLAST score calculated? ›

Per NCBI's definition page, the raw score of BLAST is the score of an alignment, calculated as the sum of substitution and gap scores.

What does "query cover" mean in BLAST? ›

The column labeled “Query Cover” provides an indication of the length of each subject sequence compared to the reference sequence. 100% coverage indicates that the subject sequence spans the entire length of the DNA Reference Sequence.

What is different max score and total score in BLAST? ›

BLAST metrics include: Max Score: Highest bit score calculated from matches and mismatches found in local alignments. The higher the max score, the better the alignment. Total Score: Sum of alignment scores for all of the sequence segments or local alignments.

How is BLAST calculated? ›

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between protein or nucleotide sequences. The program compares nucleotide or protein sequences to sequence in a database and calculates the statistical significance of the matches.

How do you interpret bit score BLAST? ›

The bit score gives an indication of how good the alignment is; the higher the score, the better the alignment. In general terms, this score is calculated from a formula that takes into account the alignment of similar or identical residues, as well as any gaps introduced to align the sequences.

How to interpret genetic test results? ›

  1. If you receive a positive test result, this means that you have a genetic condition that increases your risk of developing cancer or heart disease. ...
  2. If you receive a negative result, it means you do not have any known mutations in the genes evaluated by the Precision Health Screening test.

How do you read sequencing results? ›

The bases are read in order from left to right and top to bottom (on a chromatogram having more than one row of information). This order corresponds to the 5' end of the sequenced DNA to the 3' end. Such evenly-spaced, clear peaks make base calling straightforward and unambiguous.

How to interpret GeneMANIA results? ›

How do I interpret GeneMANIA's results?
  1. A list of genes with associated scores, including your input genes and predicted related genes.
  2. A network that shows the relationships between genes in the list. ...
  3. A list of networks weighted by their ability to connect related genes.

Top Articles
20 inspirational money quotes to set you on the path to wealth - Businessday NG
Wal-Mart is reportedly telling its tech vendors to leave Amazon's cloud
11 beste sites voor Word-labelsjablonen (2024) [GRATIS]
Driving Directions To Fedex
Localfedex.com
Nm Remote Access
Horned Stone Skull Cozy Grove
OnTrigger Enter, Exit ...
Where's The Nearest Wendy's
Phillies Espn Schedule
Olivia Ponton On Pride, Her Collection With AE & Accidentally Coming Out On TikTok
Beau John Maloney Houston Tx
Lax Arrivals Volaris
Michigan cannot fire coach Sherrone Moore for cause for known NCAA violations in sign-stealing case
Spider-Man: Across The Spider-Verse Showtimes Near Marcus Bay Park Cinema
Craigslist Toy Hauler For Sale By Owner
Aldine Isd Pay Scale 23-24
Craigslist Sparta Nj
Strange World Showtimes Near Roxy Stadium 14
Long Island Jobs Craigslist
A Biomass Pyramid Of An Ecosystem Is Shown.Tertiary ConsumersSecondary ConsumersPrimary ConsumersProducersWhich
Ahn Waterworks Urgent Care
Company History - Horizon NJ Health
Talkstreamlive
What Are The Symptoms Of A Bad Solenoid Pack E4od?
Project Reeducation Gamcore
Drying Cloths At A Hammam Crossword Clue
Kirk Franklin Mother Debra Jones Age
Dashboard Unt
Scott Surratt Salary
Marlene2995 Pagina Azul
Imagetrend Elite Delaware
Emily Katherine Correro
Craigslist Dallastx
Babbychula
Diana Lolalytics
Clark County Ky Busted Newspaper
Why The Boogeyman Is Rated PG-13
RALEY MEDICAL | Oklahoma Department of Rehabilitation Services
Koninklijk Theater Tuschinski
Gifford Christmas Craft Show 2022
Lovein Funeral Obits
The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
Pain Out Maxx Kratom
John M. Oakey & Son Funeral Home And Crematory Obituaries
56X40X25Cm
Booknet.com Contract Marriage 2
Enter The Gungeon Gunther
Www.homedepot .Com
Guy Ritchie's The Covenant Showtimes Near Look Cinemas Redlands
Uncle Pete's Wheeling Wv Menu
Download Twitter Video (X), Photo, GIF - Twitter Downloader
Latest Posts
Article information

Author: Delena Feil

Last Updated:

Views: 6097

Rating: 4.4 / 5 (65 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Delena Feil

Birthday: 1998-08-29

Address: 747 Lubowitz Run, Sidmouth, HI 90646-5543

Phone: +99513241752844

Job: Design Supervisor

Hobby: Digital arts, Lacemaking, Air sports, Running, Scouting, Shooting, Puzzles

Introduction: My name is Delena Feil, I am a clean, splendid, calm, fancy, jolly, bright, faithful person who loves writing and wants to share my knowledge and understanding with you.