Rf: n

2017-04-19T16:34:29Z

New page

= Aims =

* Directory organisation has become more necessary, due the multiple intermediate files that Genomics pipelines produce, and the many output files they produce.
* Which result-files correspond to which samples? And to which replicate? It is very easy to lose track, especially when you return to the files after a month or two.
* Therefore we include a somewhat artificial directory and file organisation exercise.
* <code>allprojs.tar</code> is a file that 16 projectsin zip format, we must lay them and their file out in orderly fashion.
* Data taken from the excellent book "Computational Genomics" by Nello Cristianini and Matthew Hahn (ref. <code>http://www.computational-genomics.net http://www.computational-genomics.net</code>).

= Getting into position =

* To ensure we are in our home directory
cd
* to ensure we have the appropriate file
ls -l hdi2u_files/allprojs.tar
* Let's look inside the tar-file or tarball, as it's sometimes called
cd hdi2u_files
tar tf allprojs.tar
* We can go ahead and extract now
tar xf allprojs.tar</code> to extract them all
* Type <code>ls</code> to make sure they have been extracted OK.
* We delete the tar file, because we have extracted all its contents.
rm allprojs.tar

= Putting files into a loop =

* Although there are not so many zip files, there's enough to discourage us from manual handling.
* so we use a loop, which in commandline is served by the <code>for i in THIS; do THAT;done</code> idiom.
* Let's first make a directory for each one, we will test first with the <code>echo</code> command.
for i in *_demo.zip; do echo "mkdir ${i%.*}"; done
* Note how it is sufficient to use the wildcard: <code>ls</code> is not required.
* Note also <code>${i%.*}</code> allows us create directory names without the <code>.zip</code> extension.
* But, hang on, there's an extra zip in our directory which is not part of our project. We can be more exact:
for i in *_demo.zip; do echo "mkdir ${i%.*}"; done
* OK, now we can go ahead:
for i in *_demo.zip; do mkdir ${i%.*}; done
* Now we can insert the zip files into their respective directories with:
for i in *_demo.zip; do mv $i ${i%.*}; done
* We can check with ls that everything is OK, but there is a command called tree which gives (slightly) nicer output.
tree | less
* This simple command verifies to us that the files have moved into the directory they correspond to.

= Operating on directories =

* We now want to extract the zip files, and can use a for-loop again, but it must act on directories not files
* This will list all directories, but we're only interested in a few of them:
ls -d */
* While theres's a few ways to handle this, we will use the concept of pre-editing a listing to select what we want.
* So we store the directory listing in a file, delete unwanted directories and operate on the content of the listing.
ls -d */ > dirlist.txt
vim !$
* We will find ourselves in vim and will use <code>dd</code> to delete the lines we don't want. Then type <code>ZZ</code> to save and get out.
* Note that this manual check slows things down, but helps confidence in what we are doing.
* Now we can build our for-loop with the confidence that we are going to operate on the right directories.
for i in $(cat dirlist.txt); do cd $i; unzip ${i%/*}.zip; rm ${i%/*}.zip; cd ..;done
* Note how we enter each directory in turn and extract the zip file inside, then remove it, and then move back into the parent directory
* We can have a look with
tree | less
And we can zip the whole structure up again with
zip -r dirstruc.zip $(ls dirlist.txt)
And delete the created directory structure itself
for i in $(cat dirlist.txt); do rm -rf $i; done
* Though this was a artificial exercise, it's common to find zip and tar files which litter your home directory, unless you create directories for them.

Hdi2u dirorg exercise - Revision history

Rf: n