Difference between revisions of "One-liners"

From wiki
Jump to: navigation, search
Line 1: Line 1:
 +
= Introduction =
 +
 +
Within command-line usage, there is a common situation whereby a single command-line starts to become quite complicated, and can take quite a while to build up.
 +
 +
In bioinformatics, one is often forced into this situation as many of the tools have quite complicated options and parameter settings.
 +
 +
It also is the usual mode of building simple for loops that can turn out to be quite powerful.
 +
 +
= Examples =
 +
 +
==Renaming multiple files==
 +
 +
There are various way of doing this. Here we have a complicated situation where we have made a filelisting and copied it, and manicured the names in the copy, and now want to rename each file in the original to its corresponding manicured version. Here follows one such example
 +
 +
 +
for i in $(cat lst0); do j=${i#*_}; R=$(printf "%s_\\w\\+_R%s\n" ${i%%_*} ${j%.*}); mv $i $(grep $R lst); done
  
  

Revision as of 08:53, 30 March 2017

Introduction

Within command-line usage, there is a common situation whereby a single command-line starts to become quite complicated, and can take quite a while to build up.

In bioinformatics, one is often forced into this situation as many of the tools have quite complicated options and parameter settings.

It also is the usual mode of building simple for loops that can turn out to be quite powerful.

Examples

Renaming multiple files

There are various way of doing this. Here we have a complicated situation where we have made a filelisting and copied it, and manicured the names in the copy, and now want to rename each file in the original to its corresponding manicured version. Here follows one such example


for i in $(cat lst0); do j=${i#*_}; R=$(printf "%s_\\w\\+_R%s\n" ${i%%_*} ${j%.*}); mv $i $(grep $R lst); done


You have a genome coverage file in bedgraph format (final, fourth column is the coverage for a particular section) amd would like to find the max value:

awk '{if($4>mxc) mxc=$4} END {print mxc}' v30chronly_s.cov

A genome coverage file what is the average coverage per base?

awk '{tot=tot+$4/($3-$2); la=$3} END {print tot/la}' v30chronly_s.cov

Note how in the above, for la (last), we want the third column on the last line which is the endpoint, but we don't know when it will occur so the third column on all lines get assigned to this variable. This will not do if there is more than one chromosome.