<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://stab.st-andrews.ac.uk/wiki/index.php?action=history&amp;feed=atom&amp;title=Hdi2u_dirorg_exercise</id>
		<title>Hdi2u dirorg exercise - Revision history</title>
		<link rel="self" type="application/atom+xml" href="http://stab.st-andrews.ac.uk/wiki/index.php?action=history&amp;feed=atom&amp;title=Hdi2u_dirorg_exercise"/>
		<link rel="alternate" type="text/html" href="http://stab.st-andrews.ac.uk/wiki/index.php?title=Hdi2u_dirorg_exercise&amp;action=history"/>
		<updated>2026-05-18T12:37:04Z</updated>
		<subtitle>Revision history for this page on the wiki</subtitle>
		<generator>MediaWiki 1.30.0</generator>

	<entry>
		<id>http://stab.st-andrews.ac.uk/wiki/index.php?title=Hdi2u_dirorg_exercise&amp;diff=1384&amp;oldid=prev</id>
		<title>Rf: n</title>
		<link rel="alternate" type="text/html" href="http://stab.st-andrews.ac.uk/wiki/index.php?title=Hdi2u_dirorg_exercise&amp;diff=1384&amp;oldid=prev"/>
				<updated>2017-04-19T16:34:29Z</updated>
		
		<summary type="html">&lt;p&gt;n&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;= Aims =&lt;br /&gt;
&lt;br /&gt;
* Directory organisation has become more necessary, due the multiple intermediate files that Genomics pipelines produce, and the many output files they produce.&lt;br /&gt;
* Which result-files correspond to which samples? And to which replicate? It is very easy to lose track, especially when you return to the files after a month or two.&lt;br /&gt;
* Therefore we include a somewhat artificial directory and file organisation exercise.&lt;br /&gt;
* &amp;lt;code&amp;gt;allprojs.tar&amp;lt;/code&amp;gt; is a file that 16 projectsin zip format, we must lay them and their file out in orderly fashion.&lt;br /&gt;
* Data taken from the excellent book &amp;quot;Computational Genomics&amp;quot; by Nello Cristianini and Matthew Hahn (ref. &amp;lt;code&amp;gt;http://www.computational-genomics.net http://www.computational-genomics.net&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
= Getting into position =&lt;br /&gt;
&lt;br /&gt;
* To ensure we are in our home directory&lt;br /&gt;
 cd&lt;br /&gt;
* to ensure we have the appropriate file&lt;br /&gt;
 ls -l hdi2u_files/allprojs.tar&lt;br /&gt;
* Let&amp;#039;s look inside the tar-file or tarball, as it&amp;#039;s sometimes called&lt;br /&gt;
 cd hdi2u_files&lt;br /&gt;
 tar tf allprojs.tar&lt;br /&gt;
* We can go ahead and extract now&lt;br /&gt;
 tar xf allprojs.tar&amp;lt;/code&amp;gt; to extract them all&lt;br /&gt;
* Type &amp;lt;code&amp;gt;ls&amp;lt;/code&amp;gt; to make sure they have been extracted OK. &lt;br /&gt;
* We delete the tar file, because we have extracted all its contents.&lt;br /&gt;
 rm allprojs.tar&lt;br /&gt;
&lt;br /&gt;
= Putting files into a loop =&lt;br /&gt;
&lt;br /&gt;
* Although there are not so many zip files, there&amp;#039;s enough to discourage us from manual handling.&lt;br /&gt;
* so we use a loop, which in commandline is served by the &amp;lt;code&amp;gt;for i in THIS; do THAT;done&amp;lt;/code&amp;gt; idiom.&lt;br /&gt;
* Let&amp;#039;s first make a directory for each one, we will test first with the &amp;lt;code&amp;gt;echo&amp;lt;/code&amp;gt; command.&lt;br /&gt;
 for i in *_demo.zip; do echo &amp;quot;mkdir ${i%.*}&amp;quot;; done&lt;br /&gt;
* Note how it is sufficient to use the wildcard: &amp;lt;code&amp;gt;ls&amp;lt;/code&amp;gt; is not required.&lt;br /&gt;
* Note also &amp;lt;code&amp;gt;${i%.*}&amp;lt;/code&amp;gt; allows us create directory names without the &amp;lt;code&amp;gt;.zip&amp;lt;/code&amp;gt; extension.&lt;br /&gt;
* But, hang on, there&amp;#039;s an extra zip in our directory which is not part of our project. We can be more exact:&lt;br /&gt;
 for i in *_demo.zip; do echo &amp;quot;mkdir ${i%.*}&amp;quot;; done&lt;br /&gt;
* OK, now we can go ahead:&lt;br /&gt;
 for i in *_demo.zip; do mkdir ${i%.*}; done&lt;br /&gt;
* Now we can insert the zip files into their respective directories with:&lt;br /&gt;
 for i in *_demo.zip; do mv $i ${i%.*}; done&lt;br /&gt;
* We can check with ls that everything is OK, but there is a command called tree which gives (slightly) nicer output.&lt;br /&gt;
 tree | less&lt;br /&gt;
* This simple command verifies to us that the files have moved into the directory they correspond to.&lt;br /&gt;
&lt;br /&gt;
= Operating on directories =&lt;br /&gt;
&lt;br /&gt;
* We now want to extract the zip files, and can use a for-loop again, but it must act on directories not files&lt;br /&gt;
* This will list all directories, but we&amp;#039;re only interested in a few of them:&lt;br /&gt;
 ls -d */&lt;br /&gt;
* While theres&amp;#039;s a few ways to handle this, we will use the concept of pre-editing a listing to select what we want.&lt;br /&gt;
* So we store the directory listing in a file, delete unwanted directories and operate on the content of the listing.&lt;br /&gt;
 ls -d */ &amp;gt; dirlist.txt&lt;br /&gt;
 vim !$&lt;br /&gt;
* We will find ourselves in vim and will use &amp;lt;code&amp;gt;dd&amp;lt;/code&amp;gt; to delete the lines we don&amp;#039;t want. Then type &amp;lt;code&amp;gt;ZZ&amp;lt;/code&amp;gt; to save and get out.&lt;br /&gt;
* Note that this manual check slows things down, but helps confidence in what we are doing.&lt;br /&gt;
* Now we can build our for-loop with the confidence that we are going to operate on the right directories.&lt;br /&gt;
 for i in $(cat dirlist.txt); do cd $i; unzip ${i%/*}.zip; rm ${i%/*}.zip; cd ..;done&lt;br /&gt;
* Note how we enter each directory in turn and extract the zip file inside, then remove it, and then move back into the parent directory&lt;br /&gt;
* We can have a look with &lt;br /&gt;
 tree | less&lt;br /&gt;
And we can zip the whole structure up again with&lt;br /&gt;
 zip -r dirstruc.zip $(ls dirlist.txt)&lt;br /&gt;
And delete the created directory structure itself&lt;br /&gt;
 for i in $(cat dirlist.txt); do rm -rf $i; done&lt;br /&gt;
* Though this was a artificial exercise, it&amp;#039;s common to find zip and tar files which litter your home directory, unless you create directories for them.&lt;/div&gt;</summary>
		<author><name>Rf</name></author>	</entry>

	</feed>