Blast
Contents
Introduction
The workhorse of bioinformatics, some tips on usage, and (mostly) how to speed it up
BLAST
READ this for how to run blast: https://www.ncbi.nlm.nih.gov/books/NBK279680/
http://nebc.nerc.ac.uk/bioinformatics/documentation/blast+/user_manual.pdf
- Put this in your script or in your .bash_profile
export BLASTDB=/shelf/public/blastntnr/blastDatabases
- to add the latest BLAST tools to you path
export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH
- now you can run against nr , nt , human_genomic, uniprot_sprot.fasta:
> blastp -db nr -query amino_acid.fasta -out tests.txt
Taxonomy database is also in here: /shelf/public/blastntnr/blastDatabases. So if you want the extended BLAST format, this should just work, assuming you ask for it.
FULL EXAMPLE:
export BLASTDB=/shelf/public/blastntnr/blastDatabases
export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH
blastp -db nr -query MY_AA.fasta -evalue 1e-5 -seg no -num_threads 8 -outfmt "6 std salltitles staxids sscinames scomnames sskingdoms" -out MY_AA_vs_nr.fa_NR_18oct2018.tab
Output formats
-m 9
This is for the old blast and mpiblast
- Query id
- Subject id
- % identity
- alignment length
- mismatches
- gap openings
- q. start
- q. end
- s. start
- s. end
- e-value
- bit score
Benchmark Exercises
Transcriptome Panda Blood
- 92600 transcripts (contigs) in 2.4 million line FASTA file.
- blastx speed on 62-fragmented nr database: 5% (about 4500 contigs then) in 33 hours on blastx