Blast

From wiki
Revision as of 13:57, 23 October 2018 by PeterThorpe (talk | contribs) (added blast databases with tax db.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

The workhorse of bioinformatics, some tips on usage, and (mostly) how to speed it up


BLAST

READ this for how to run blast: https://www.ncbi.nlm.nih.gov/books/NBK279680/

http://nebc.nerc.ac.uk/bioinformatics/documentation/blast+/user_manual.pdf


  1. Put this in your script or in your .bash_profile

export BLASTDB=/shelf/public/blastntnr/blastDatabases

  1. to add the latest BLAST tools to you path

export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH

  1. now you can run against nr , nt , human_genomic, uniprot_sprot.fasta:

> blastp -db nr -query amino_acid.fasta -out tests.txt

Taxonomy database is also in here: /shelf/public/blastntnr/blastDatabases. So if you want the extended BLAST format, this should just work, assuming you ask for it.

FULL EXAMPLE:

export BLASTDB=/shelf/public/blastntnr/blastDatabases

export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH

blastp -db nr -query MY_AA.fasta -evalue 1e-5 -seg no -num_threads 8 -outfmt "6 std salltitles staxids sscinames scomnames sskingdoms" -out MY_AA_vs_nr.fa_NR_18oct2018.tab


Output formats

-m 9

This is for the old blast and mpiblast

  1. Query id
  2. Subject id
  3.  % identity
  4. alignment length
  5. mismatches
  6. gap openings
  7. q. start
  8. q. end
  9. s. start
  10. s. end
  11. e-value
  12. bit score

Benchmark Exercises

Transcriptome Panda Blood

  • 92600 transcripts (contigs) in 2.4 million line FASTA file.
  • blastx speed on 62-fragmented nr database: 5% (about 4500 contigs then) in 33 hours on blastx