|
|
| Line 1: |
Line 1: |
| − | =Introduction=
| + | No Name [/home/nutria], b1, wiki |
| − | | + | |Introduction |
| − | Assessment of short read quality
| + | |FastQC |
| − | | + | = . |fastqc's help file |
| − | =MultiQC =
| + | |MultiQC |
| − | | + | |multiqc's help file |
| − | A relatively new tool that aggregates the output of FASTQC into one report.
| |
| − | | |
| − | * available on the command line with any module loading as it is a python module (already installed easily via pip)
| |
| − | | |
| − | Go into the directory where the FASTQC output is and run
| |
| − | multiqc . | |
| − | | |
| − | the dot stands for the local directory, and is obligatory.
| |
| − | | |
| − | Under its general statistics we get the following headings:
| |
| − | | |
| − | * Sample Name
| |
| − | * % Dups
| |
| − | * % GC
| |
| − | * Length
| |
| − | * M Seqs, millions of sequences
| |
| − | | |
| − | = multiqc's help file =
| |
| − | | |
| − | Usage: multiqc [OPTIONS] <analysis directory>
| |
| − |
| |
| − | MultiQC aggregates results from bioinformatics analyses across many
| |
| − | samples into a single report.
| |
| − |
| |
| − | It searches a given directory for analysis logs and compiles a HTML
| |
| − | report. It's a general use tool, perfect for summarising the output from
| |
| − | numerous bioinformatics tools.
| |
| − |
| |
| − | To run, supply with one or more directory to scan for analysis results.
| |
| − | To run here, use 'multiqc .'
| |
| − |
| |
| − | See http://multiqc.info for more details.
| |
| − |
| |
| − | Author: Phil Ewels (http://phil.ewels.co.uk)
| |
| − |
| |
| − | Options:
| |
| − | -f, --force Overwrite any existing reports
| |
| − | -d, --dirs Prepend directory to sample names
| |
| − | -s, --fullnames Do not clean the sample names (leave as full
| |
| − | file name)
| |
| − | -i, --title TEXT Report title. Printed as page header, used
| |
| − | for filename if not otherwise specified.
| |
| − | -n, --filename TEXT Report filename. Use 'stdout' to print to
| |
| − | standard out.
| |
| − | -o, --outdir TEXT Create report in the specified output
| |
| − | directory.
| |
| − | -t, --template [default|default_dev|geo|simple]
| |
| − | Report template to use.
| |
| − | -x, --ignore TEXT Ignore analysis files (glob expression)
| |
| − | -e, --exclude [module name] Do not use this module. Can specify multiple
| |
| − | times.
| |
| − | -m, --module [module name] Use only this module. Can specify multiple
| |
| − | times.
| |
| − | --data-dir / --no-data-dir Specify whether the parsed data directory
| |
| − | should be created.
| |
| − | -k, --data-format [tsv|yaml|json]
| |
| − | Output parsed data in a different format
| |
| − | -z, --zip-data-dir Compress the data directory.
| |
| − | --flat Use only flat plots (static images)
| |
| − | --interactive Use only interactive plots (HighCharts
| |
| − | Javascript)
| |
| − | -c, --config PATH Specific config file to load, after those in
| |
| − | MultiQC dir / home dir / working dir.
| |
| − | -v, --verbose Increase output verbosity.
| |
| − | -q, --quiet Only show log warnings
| |
| − | --version Show the version and exit.
| |
| − | -h, --help Show this message and exit.
| |
| − | | |
| − | | |
| − | = fastqc's help file =
| |
| − | | |
| − | SYNOPSIS
| |
| − |
| |
| − | fastqc seqfile1 seqfile2 .. seqfileN
| |
| − |
| |
| − | fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN
| |
| − |
| |
| − | DESCRIPTION
| |
| − |
| |
| − | FastQC reads a set of sequence files and produces from each one a quality
| |
| − | control report consisting of a number of different modules, each one of
| |
| − | which will help to identify a different potential type of problem in your
| |
| − | data.
| |
| − |
| |
| − | If no files to process are specified on the command line then the program
| |
| − | will start as an interactive graphical application. If files are provided
| |
| − | on the command line then the program will run with no user interaction
| |
| − | required. In this mode it is suitable for inclusion into a standardised
| |
| − | analysis pipeline.
| |
| − |
| |
| − | The options for the program as as follows:
| |
| − |
| |
| − | -h --help Print this help file and exit
| |
| − |
| |
| − | -v --version Print the version of the program and exit
| |
| − |
| |
| − | -o --outdir Create all output files in the specified output directory.
| |
| − | Please note that this directory must exist as the program
| |
| − | will not create it. If this option is not set then the
| |
| − | output file for each sequence file is created in the same
| |
| − | directory as the sequence file which was processed.
| |
| − |
| |
| − | --casava Files come from raw casava output. Files in the same sample
| |
| − | group (differing only by the group number) will be analysed
| |
| − | as a set rather than individually. Sequences with the filter
| |
| − | flag set in the header will be excluded from the analysis.
| |
| − | Files must have the same names given to them by casava
| |
| − | (including being gzipped and ending with .gz) otherwise they
| |
| − | won't be grouped together correctly.
| |
| − |
| |
| − | --nofilter If running with --casava then don't remove read flagged by
| |
| − | casava as poor quality when performing the QC analysis.
| |
| − |
| |
| − | --extract If set then the zipped output file will be uncompressed in
| |
| − | the same directory after it has been created. By default
| |
| − | this option will be set if fastqc is run in non-interactive
| |
| − | mode.
| |
| − |
| |
| − | -j --java Provides the full path to the java binary you want to use to
| |
| − | launch fastqc. If not supplied then java is assumed to be in
| |
| − | your path.
| |
| − |
| |
| − | --noextract Do not uncompress the output file after creating it. You
| |
| − | should set this option if you do not wish to uncompress
| |
| − | the output when running in non-interactive mode.
| |
| − |
| |
| − | --nogroup Disable grouping of bases for reads >50bp. All reports will
| |
| − | show data for every base in the read. WARNING: Using this
| |
| − | option will cause fastqc to crash and burn if you use it on
| |
| − | really long reads, and your plots may end up a ridiculous size.
| |
| − | You have been warned!
| |
| − |
| |
| − | -f --format Bypasses the normal sequence file format detection and
| |
| − | forces the program to use the specified format. Valid
| |
| − | formats are bam,sam,bam_mapped,sam_mapped and fastq
| |
| − |
| |
| − | -t --threads Specifies the number of files which can be processed
| |
| − | simultaneously. Each thread will be allocated 250MB of
| |
| − | memory so you shouldn't run more threads than your
| |
| − | available memory will cope with, and not more than
| |
| − | 6 threads on a 32 bit machine
| |
| − |
| |
| − | -c Specifies a non-default file which contains the list of
| |
| − | --contaminants contaminants to screen overrepresented sequences against.
| |
| − | The file must contain sets of named contaminants in the
| |
| − | form name[tab]sequence. Lines prefixed with a hash will
| |
| − | be ignored.
| |
| − |
| |
| − | -a Specifies a non-default file which contains the list of
| |
| − | --adapters adapter sequences which will be explicity searched against
| |
| − | the library. The file must contain sets of named adapters
| |
| − | in the form name[tab]sequence. Lines prefixed with a hash
| |
| − | will be ignored.
| |
| − |
| |
| − | -l Specifies a non-default file which contains a set of criteria
| |
| − | --limits which will be used to determine the warn/error limits for the
| |
| − | various modules. This file can also be used to selectively
| |
| − | remove some modules from the output all together. The format
| |
| − | needs to mirror the default limits.txt file found in the
| |
| − | Configuration folder.
| |