Introduction

CD-HIT is primarily a clustering program which, for input, takes fasta sequence files which are being envisioned as databases against which query sequence files will search.

A major concern with such fasta files is the level of redundancy they may have. Depending on the experiment or analysis being run, the degree of detail in the database file may be too high, and there is a benefit to clustering sequences that are similar. CD-HIT is used for this.

Common use-cases

Clustering the Antibiotic Resistance Gene database

cdhit-est -i argannot-nt_doc.fasta -o argannot_cdhit90 -d 0 > argannot_cdhit90.stdout

Cd-hit

Introduction

Common use-cases

Clustering the Antibiotic Resistance Gene database

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools