Introduction

The workhorse of bioinformatics, some tips on usage, and (mostly) how to speed it up

BLAST

READ this for how to run blast: https://www.ncbi.nlm.nih.gov/books/NBK279680/

http://nebc.nerc.ac.uk/bioinformatics/documentation/blast+/user_manual.pdf

Put this in your script or in your .bash_profile

export BLASTDB=/shelf/public/blastntnr/blastDatabases

to add the latest BLAST tools to you path

export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH

now you can run against nr , nt , human_genomic, uniprot_sprot.fasta:

> blastp -db nr -query amino_acid.fasta -out tests.txt

Taxonomy database is also in here: /shelf/public/blastntnr/blastDatabases. So if you want the extended BLAST format, this should just work, assuming you ask for it.

FULL EXAMPLE:

export BLASTDB=/shelf/public/blastntnr/blastDatabases

export PATH=/shelf/apps/ncbi-blast-2.7.1+/bin/:$PATH

blastp -db nr -query MY_AA.fasta -evalue 1e-5 -seg no -num_threads 8 -outfmt "6 std salltitles staxids sscinames scomnames sskingdoms" -out MY_AA_vs_nr.fa_NR_18oct2018.tab

Output formats

-m 9

This is for the old blast and mpiblast

Query id
Subject id
% identity
alignment length
mismatches
gap openings
q. start
q. end
s. start
s. end
e-value
bit score

Benchmark Exercises

Transcriptome Panda Blood

92600 transcripts (contigs) in 2.4 million line FASTA file.
blastx speed on 62-fragmented nr database: 5% (about 4500 contigs then) in 33 hours on blastx

Blast

Contents

Introduction

BLAST

FULL EXAMPLE:

Output formats

-m 9

Benchmark Exercises

Transcriptome Panda Blood

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools