|
|
Line 1: |
Line 1: |
− | =Introduction=
| + | No Name [/home/nutria], b1, wiki |
− | | + | |Introduction |
− | Assessment of short read quality
| + | |FastQC |
− | | + | = . |fastqc's help file |
− | =MultiQC =
| + | |MultiQC |
− | | + | |multiqc's help file |
− | A relatively new tool that aggregates the output of FASTQC into one report.
| |
− | | |
− | * available on the command line with any module loading as it is a python module (already installed easily via pip)
| |
− | | |
− | Go into the directory where the FASTQC output is and run
| |
− | multiqc . | |
− | | |
− | the dot stands for the local directory, and is obligatory.
| |
− | | |
− | Under its general statistics we get the following headings:
| |
− | | |
− | * Sample Name
| |
− | * % Dups
| |
− | * % GC
| |
− | * Length
| |
− | * M Seqs, millions of sequences
| |
− | | |
− | = multiqc's help file =
| |
− | | |
− | Usage: multiqc [OPTIONS] <analysis directory>
| |
− |
| |
− | MultiQC aggregates results from bioinformatics analyses across many
| |
− | samples into a single report.
| |
− |
| |
− | It searches a given directory for analysis logs and compiles a HTML
| |
− | report. It's a general use tool, perfect for summarising the output from
| |
− | numerous bioinformatics tools.
| |
− |
| |
− | To run, supply with one or more directory to scan for analysis results.
| |
− | To run here, use 'multiqc .'
| |
− |
| |
− | See http://multiqc.info for more details.
| |
− |
| |
− | Author: Phil Ewels (http://phil.ewels.co.uk)
| |
− |
| |
− | Options:
| |
− | -f, --force Overwrite any existing reports
| |
− | -d, --dirs Prepend directory to sample names
| |
− | -s, --fullnames Do not clean the sample names (leave as full
| |
− | file name)
| |
− | -i, --title TEXT Report title. Printed as page header, used
| |
− | for filename if not otherwise specified.
| |
− | -n, --filename TEXT Report filename. Use 'stdout' to print to
| |
− | standard out.
| |
− | -o, --outdir TEXT Create report in the specified output
| |
− | directory.
| |
− | -t, --template [default|default_dev|geo|simple]
| |
− | Report template to use.
| |
− | -x, --ignore TEXT Ignore analysis files (glob expression)
| |
− | -e, --exclude [module name] Do not use this module. Can specify multiple
| |
− | times.
| |
− | -m, --module [module name] Use only this module. Can specify multiple
| |
− | times.
| |
− | --data-dir / --no-data-dir Specify whether the parsed data directory
| |
− | should be created.
| |
− | -k, --data-format [tsv|yaml|json]
| |
− | Output parsed data in a different format
| |
− | -z, --zip-data-dir Compress the data directory.
| |
− | --flat Use only flat plots (static images)
| |
− | --interactive Use only interactive plots (HighCharts
| |
− | Javascript)
| |
− | -c, --config PATH Specific config file to load, after those in
| |
− | MultiQC dir / home dir / working dir.
| |
− | -v, --verbose Increase output verbosity.
| |
− | -q, --quiet Only show log warnings
| |
− | --version Show the version and exit.
| |
− | -h, --help Show this message and exit.
| |
− | | |
− | | |
− | = fastqc's help file =
| |
− | | |
− | SYNOPSIS
| |
− |
| |
− | fastqc seqfile1 seqfile2 .. seqfileN
| |
− |
| |
− | fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN
| |
− |
| |
− | DESCRIPTION
| |
− |
| |
− | FastQC reads a set of sequence files and produces from each one a quality
| |
− | control report consisting of a number of different modules, each one of
| |
− | which will help to identify a different potential type of problem in your
| |
− | data.
| |
− |
| |
− | If no files to process are specified on the command line then the program
| |
− | will start as an interactive graphical application. If files are provided
| |
− | on the command line then the program will run with no user interaction
| |
− | required. In this mode it is suitable for inclusion into a standardised
| |
− | analysis pipeline.
| |
− |
| |
− | The options for the program as as follows:
| |
− |
| |
− | -h --help Print this help file and exit
| |
− |
| |
− | -v --version Print the version of the program and exit
| |
− |
| |
− | -o --outdir Create all output files in the specified output directory.
| |
− | Please note that this directory must exist as the program
| |
− | will not create it. If this option is not set then the
| |
− | output file for each sequence file is created in the same
| |
− | directory as the sequence file which was processed.
| |
− |
| |
− | --casava Files come from raw casava output. Files in the same sample
| |
− | group (differing only by the group number) will be analysed
| |
− | as a set rather than individually. Sequences with the filter
| |
− | flag set in the header will be excluded from the analysis.
| |
− | Files must have the same names given to them by casava
| |
− | (including being gzipped and ending with .gz) otherwise they
| |
− | won't be grouped together correctly.
| |
− |
| |
− | --nofilter If running with --casava then don't remove read flagged by
| |
− | casava as poor quality when performing the QC analysis.
| |
− |
| |
− | --extract If set then the zipped output file will be uncompressed in
| |
− | the same directory after it has been created. By default
| |
− | this option will be set if fastqc is run in non-interactive
| |
− | mode.
| |
− |
| |
− | -j --java Provides the full path to the java binary you want to use to
| |
− | launch fastqc. If not supplied then java is assumed to be in
| |
− | your path.
| |
− |
| |
− | --noextract Do not uncompress the output file after creating it. You
| |
− | should set this option if you do not wish to uncompress
| |
− | the output when running in non-interactive mode.
| |
− |
| |
− | --nogroup Disable grouping of bases for reads >50bp. All reports will
| |
− | show data for every base in the read. WARNING: Using this
| |
− | option will cause fastqc to crash and burn if you use it on
| |
− | really long reads, and your plots may end up a ridiculous size.
| |
− | You have been warned!
| |
− |
| |
− | -f --format Bypasses the normal sequence file format detection and
| |
− | forces the program to use the specified format. Valid
| |
− | formats are bam,sam,bam_mapped,sam_mapped and fastq
| |
− |
| |
− | -t --threads Specifies the number of files which can be processed
| |
− | simultaneously. Each thread will be allocated 250MB of
| |
− | memory so you shouldn't run more threads than your
| |
− | available memory will cope with, and not more than
| |
− | 6 threads on a 32 bit machine
| |
− |
| |
− | -c Specifies a non-default file which contains the list of
| |
− | --contaminants contaminants to screen overrepresented sequences against.
| |
− | The file must contain sets of named contaminants in the
| |
− | form name[tab]sequence. Lines prefixed with a hash will
| |
− | be ignored.
| |
− |
| |
− | -a Specifies a non-default file which contains the list of
| |
− | --adapters adapter sequences which will be explicity searched against
| |
− | the library. The file must contain sets of named adapters
| |
− | in the form name[tab]sequence. Lines prefixed with a hash
| |
− | will be ignored.
| |
− |
| |
− | -l Specifies a non-default file which contains a set of criteria
| |
− | --limits which will be used to determine the warn/error limits for the
| |
− | various modules. This file can also be used to selectively
| |
− | remove some modules from the output all together. The format
| |
− | needs to mirror the default limits.txt file found in the
| |
− | Configuration folder.
| |