Difference between revisions of "Blast"

From wiki
Jump to: navigation, search
(added blast databases with tax db.)
 
(One intermediate revision by one other user not shown)
Line 2: Line 2:
  
 
The workhorse of bioinformatics, some tips on usage, and (mostly) how to speed it up
 
The workhorse of bioinformatics, some tips on usage, and (mostly) how to speed it up
 +
 +
 +
== BLAST ==
 +
 +
 +
READ this for how to run blast: https://www.ncbi.nlm.nih.gov/books/NBK279680/
 +
 +
http://nebc.nerc.ac.uk/bioinformatics/documentation/blast+/user_manual.pdf
 +
 +
 +
# Put this in your script or in your .bash_profile
 +
export BLASTDB=/shelf/public/blastntnr/blastDatabases
 +
 +
# to add the latest BLAST tools to you path
 +
export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH
 +
 +
# now you can run against nr , nt , human_genomic, uniprot_sprot.fasta:
 +
> blastp -db '''nr''' -query amino_acid.fasta -out tests.txt
 +
 +
Taxonomy database is also in here: /shelf/public/blastntnr/blastDatabases. So if you want the extended BLAST format, this should just work, assuming you ask for it.
 +
 +
== FULL EXAMPLE: ==
 +
export BLASTDB=/shelf/public/blastntnr/blastDatabases
 +
 +
export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH
 +
 +
blastp -db nr -query MY_AA.fasta -evalue 1e-5 -seg no -num_threads 8 -outfmt "6 std salltitles staxids sscinames scomnames sskingdoms" -out MY_AA_vs_nr.fa_NR_18oct2018.tab
 +
 +
  
 
= Output formats =
 
= Output formats =
Line 9: Line 38:
 
This is for the old blast and mpiblast
 
This is for the old blast and mpiblast
  
* Query id
+
# Query id
* Subject id
+
# Subject id
* % identity
+
# % identity
* alignment length
+
# alignment length
* mismatches
+
# mismatches
* gap openings
+
# gap openings
* q. start
+
# q. start
* q. end
+
# q. end
* s. start
+
# s. start
* s. end
+
# s. end
* e-value
+
# e-value
* bit score  
+
# bit score  
  
 
= Benchmark Exercises =
 
= Benchmark Exercises =

Latest revision as of 13:57, 23 October 2018

Introduction

The workhorse of bioinformatics, some tips on usage, and (mostly) how to speed it up


BLAST

READ this for how to run blast: https://www.ncbi.nlm.nih.gov/books/NBK279680/

http://nebc.nerc.ac.uk/bioinformatics/documentation/blast+/user_manual.pdf


  1. Put this in your script or in your .bash_profile

export BLASTDB=/shelf/public/blastntnr/blastDatabases

  1. to add the latest BLAST tools to you path

export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH

  1. now you can run against nr , nt , human_genomic, uniprot_sprot.fasta:

> blastp -db nr -query amino_acid.fasta -out tests.txt

Taxonomy database is also in here: /shelf/public/blastntnr/blastDatabases. So if you want the extended BLAST format, this should just work, assuming you ask for it.

FULL EXAMPLE:

export BLASTDB=/shelf/public/blastntnr/blastDatabases

export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH

blastp -db nr -query MY_AA.fasta -evalue 1e-5 -seg no -num_threads 8 -outfmt "6 std salltitles staxids sscinames scomnames sskingdoms" -out MY_AA_vs_nr.fa_NR_18oct2018.tab


Output formats

-m 9

This is for the old blast and mpiblast

  1. Query id
  2. Subject id
  3.  % identity
  4. alignment length
  5. mismatches
  6. gap openings
  7. q. start
  8. q. end
  9. s. start
  10. s. end
  11. e-value
  12. bit score

Benchmark Exercises

Transcriptome Panda Blood

  • 92600 transcripts (contigs) in 2.4 million line FASTA file.
  • blastx speed on 62-fragmented nr database: 5% (about 4500 contigs then) in 33 hours on blastx