Difference between revisions of "Blast"
(Created page with "=Introduction= The workhorse of bioinformatics, some tips on usgae, and (mostly) how to speed it up = Benchmark Exercises = == Transcriptome Panda Blood == * 92600 transcri...") |
PeterThorpe (talk | contribs) (added blast databases with tax db.) |
||
| (4 intermediate revisions by one other user not shown) | |||
| Line 1: | Line 1: | ||
=Introduction= | =Introduction= | ||
| − | The workhorse of bioinformatics, some tips on | + | |
| + | The workhorse of bioinformatics, some tips on usage, and (mostly) how to speed it up | ||
| + | |||
| + | |||
| + | == BLAST == | ||
| + | |||
| + | |||
| + | READ this for how to run blast: https://www.ncbi.nlm.nih.gov/books/NBK279680/ | ||
| + | |||
| + | http://nebc.nerc.ac.uk/bioinformatics/documentation/blast+/user_manual.pdf | ||
| + | |||
| + | |||
| + | # Put this in your script or in your .bash_profile | ||
| + | export BLASTDB=/shelf/public/blastntnr/blastDatabases | ||
| + | |||
| + | # to add the latest BLAST tools to you path | ||
| + | export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH | ||
| + | |||
| + | # now you can run against nr , nt , human_genomic, uniprot_sprot.fasta: | ||
| + | > blastp -db '''nr''' -query amino_acid.fasta -out tests.txt | ||
| + | |||
| + | Taxonomy database is also in here: /shelf/public/blastntnr/blastDatabases. So if you want the extended BLAST format, this should just work, assuming you ask for it. | ||
| + | |||
| + | == FULL EXAMPLE: == | ||
| + | export BLASTDB=/shelf/public/blastntnr/blastDatabases | ||
| + | |||
| + | export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH | ||
| + | |||
| + | blastp -db nr -query MY_AA.fasta -evalue 1e-5 -seg no -num_threads 8 -outfmt "6 std salltitles staxids sscinames scomnames sskingdoms" -out MY_AA_vs_nr.fa_NR_18oct2018.tab | ||
| + | |||
| + | |||
| + | |||
| + | = Output formats = | ||
| + | |||
| + | == -m 9 == | ||
| + | |||
| + | This is for the old blast and mpiblast | ||
| + | |||
| + | # Query id | ||
| + | # Subject id | ||
| + | # % identity | ||
| + | # alignment length | ||
| + | # mismatches | ||
| + | # gap openings | ||
| + | # q. start | ||
| + | # q. end | ||
| + | # s. start | ||
| + | # s. end | ||
| + | # e-value | ||
| + | # bit score | ||
= Benchmark Exercises = | = Benchmark Exercises = | ||
| Line 7: | Line 56: | ||
* 92600 transcripts (contigs) in 2.4 million line FASTA file. | * 92600 transcripts (contigs) in 2.4 million line FASTA file. | ||
| − | * blastx speed on 62-fragmented nr database: 5% | + | * blastx speed on 62-fragmented nr database: 5% (about 4500 contigs then) in 33 hours on blastx |
Latest revision as of 13:57, 23 October 2018
Contents
Introduction
The workhorse of bioinformatics, some tips on usage, and (mostly) how to speed it up
BLAST
READ this for how to run blast: https://www.ncbi.nlm.nih.gov/books/NBK279680/
http://nebc.nerc.ac.uk/bioinformatics/documentation/blast+/user_manual.pdf
- Put this in your script or in your .bash_profile
export BLASTDB=/shelf/public/blastntnr/blastDatabases
- to add the latest BLAST tools to you path
export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH
- now you can run against nr , nt , human_genomic, uniprot_sprot.fasta:
> blastp -db nr -query amino_acid.fasta -out tests.txt
Taxonomy database is also in here: /shelf/public/blastntnr/blastDatabases. So if you want the extended BLAST format, this should just work, assuming you ask for it.
FULL EXAMPLE:
export BLASTDB=/shelf/public/blastntnr/blastDatabases
export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH
blastp -db nr -query MY_AA.fasta -evalue 1e-5 -seg no -num_threads 8 -outfmt "6 std salltitles staxids sscinames scomnames sskingdoms" -out MY_AA_vs_nr.fa_NR_18oct2018.tab
Output formats
-m 9
This is for the old blast and mpiblast
- Query id
- Subject id
- % identity
- alignment length
- mismatches
- gap openings
- q. start
- q. end
- s. start
- s. end
- e-value
- bit score
Benchmark Exercises
Transcriptome Panda Blood
- 92600 transcripts (contigs) in 2.4 million line FASTA file.
- blastx speed on 62-fragmented nr database: 5% (about 4500 contigs then) in 33 hours on blastx