Difference between revisions of "Directory Organization Exercise"

From wiki
Jump to: navigation, search
Line 27: Line 27:
 
* <code>for i in $(ls *.zip); do mv $i ${i%.*}; done</code> by which we move the zip files into their corresponding directories.
 
* <code>for i in $(ls *.zip); do mv $i ${i%.*}; done</code> by which we move the zip files into their corresponding directories.
 
* <code>tree</code> this simple command verifies to us that the files have moved into the the directory they correspond to.
 
* <code>tree</code> this simple command verifies to us that the files have moved into the the directory they correspond to.
* <code>for i in $(ls -d */); do cd $i; unzip ${i%/*}.zip; rm c${i%/*}.zip; d ..;done</code>, by which we enter each directory in turn and extract the xip file inside, then remove it, and then move back into the parent directory
+
* <code>for i in $(ls -d */); do cd $i; unzip ${i%/*}.zip; rm c${i%/*}.zip; cd ..;done</code>, by which we enter each directory in turn and extract the xip file inside, then remove it, and then move back into the parent directory
  
 
This is the end of the "streamlined" version of this answer to the exercise.
 
This is the end of the "streamlined" version of this answer to the exercise.
Line 54: Line 54:
  
 
In this case (and this is the method we ended up doing during the class) we want to bulding up a file listing of the zip file and edit this list to only include the zip files of interest.
 
In this case (and this is the method we ended up doing during the class) we want to bulding up a file listing of the zip file and edit this list to only include the zip files of interest.
 +
 +
The first part is the exact same as the streamlined version. Lets repeat the last relevant command
 +
 +
* <code>tree</code> by which we recognise that all the zip files are inside a directory which matches their filename
 +
* <code>find -iname "*.zip" > f.l</code> we direct the output of the '''find''' command to a file called '''f.l''' which is a file listing.
 +
* <code>rm allprojs.tar</code>, by which we delete the tar file, because we have extracted all its contents.
 +
* <code>for i in $(ls *.zip); do mkdir ${i%.*}; done</code> by which we create the directories into which we plan to mv the zip files
 +
* <code>for i in $(ls *.zip); do mv $i ${i%.*}; done</code> by which we move the zip files into their corresponding directories.
 +
*

Revision as of 20:45, 6 October 2016

Aims

Directory organisation has become more necessary, due the multiple intermediate files that Genomics pipelines produce, and the many output files they produce. Which results file corresponds to which sample? To which replicate?

This exercise bundles 16 projects, packed, in their turn, into 16 zip files.

Data taken from the excellent book "Computational Genomics" by Nello Cristianini and Matthew Hahn (ref. http://www.computational-genomics.net).

Commands

(There are several ways to undertake this task, but this one aims to make use of TAB-COMPLETION keys and HISTORY.

This exercise uses "shell" for loops, whose parts are spearated by semicolons. Semicolons do not really need an associated space character. They delimit the individual commands.

Streamlined solution

This is the streamlined version, not that one followed during the course, as it does not require the "heavy lifting" tools of "find" and "vim".

  • cd to ensure we are in our home directory
  • cp $TCH/allprojs.tar . to copy the bundle with all the projects to our home page.
  • tar -tf allprojs.tar to look inside the tar ("tape archive" file) which contains all the zip files.
  • tar -xf allprojs.tar to extract them all
  • ls to make sure they have been extracted.
  • rm allprojs.tar, by which we delete the tar file, because we have extracted all its contents.
  • for i in $(ls *.zip); do mkdir ${i%.*}; done by which we create the directories into which we plan to mv the zip files
  • for i in $(ls *.zip); do mv $i ${i%.*}; done by which we move the zip files into their corresponding directories.
  • tree this simple command verifies to us that the files have moved into the the directory they correspond to.
  • for i in $(ls -d */); do cd $i; unzip ${i%/*}.zip; rm c${i%/*}.zip; cd ..;done, by which we enter each directory in turn and extract the xip file inside, then remove it, and then move back into the parent directory

This is the end of the "streamlined" version of this answer to the exercise.

Streamlined solution

This is the streamlined version, not that one followed during the course, as it does not require the "heavy lifting" tools of "find" and "vim".

  • cd to ensure we are in our home directory
  • cp $TCH/allprojs.tar . to copy the bundle with all the projects to our home page.
  • tar -tf allprojs.tar to look inside the tar ("tape archive" file) which contains all the zip files.
  • tar -xf allprojs.tar to extract them all
  • ls to make sure they have been extracted.
  • rm allprojs.tar, by which we delete the tar file, because we have extracted all its contents.
  • for i in $(ls *.zip); do mkdir ${i%.*}; done by which we create the directories into which we plan to mv the zip files
  • for i in $(ls *.zip); do mv $i ${i%.*}; done by which we move the zip files into their corresponding directories.
  • tree this simple command verifies to us that the files have moved into the the directory they correspond to.
  • for i in $(ls -d */); do cd $i; unzip ${i%/*}.zip; rm c${i%/*}.zip; d ..;done, by which we enter each directory in turn and extract the xip file inside, then remove it, and then move back into the parent directory

Clunky solution (useful when not all the projects are of interest)

This is the method used during class. It was adopted by accident, but actually it does reflect quite a common situation, described here:

Sometimes we have a bunch of projects (read: replicates of experiment) that we want to group together, but which are not all of equal interest. Perhaps because some are contaminated.

In this case (and this is the method we ended up doing during the class) we want to bulding up a file listing of the zip file and edit this list to only include the zip files of interest.

The first part is the exact same as the streamlined version. Lets repeat the last relevant command

  • tree by which we recognise that all the zip files are inside a directory which matches their filename
  • find -iname "*.zip" > f.l we direct the output of the find command to a file called f.l which is a file listing.
  • rm allprojs.tar, by which we delete the tar file, because we have extracted all its contents.
  • for i in $(ls *.zip); do mkdir ${i%.*}; done by which we create the directories into which we plan to mv the zip files
  • for i in $(ls *.zip); do mv $i ${i%.*}; done by which we move the zip files into their corresponding directories.