Hdi2u commandbased exercises

From wiki
Revision as of 13:34, 19 April 2017 by Rf (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The Command-line (shell)

The real power of Linux/Unix systems is the command line.

  • Many programs and facilities are available through graphical options on Linux, but all programs and facilities can be accessed by the command line, also known as the shell.
  • Graphical interfaces are good for reduced data, when analysing processed data.
  • web services and curses-mode screens are halfway between command-line and grapihical interface.
  • However for "heavy-lifting", the command-line is much more convenient
  • Obvious examples include when you need to work with large numbers of files or want to automate processes.
  • It's common to talk about "fear of the commandline", our aim is to reduce this.

Anatomy of a Command

<command> <options/parameters> <arguments>
  • <command> what do I want to do?
  • <options/parameters> how do I want to do it?
  • <arguments>, on what do I want to do it?
  • first word you supply on the command line is interpreted by the system as a command, an operation.
  • Items that appear after that on on the same line are separated by spaces.
  • Most commands have options available that will alter the way the command functions.
  • after the options we have what are called arguments, often these are input files.
  • With some commands you don't need to issue any parameters or arguments. This is because you are using the default settings.
  • To know the default settings the documentation must be read.
  • If a command runs successfully, it often will not report anything back to you
  • You can of course tell by the nature of the putput files it produced.
  • If a command is unsuccessful, it will report an error. Most of the time, these are informative, even if a bit cryptic.
  • However, if you forgot to specify the input file, you should be able to interpret that.

Navigating the command line

  • Bioinformatics tools can have very many options
  • They also can be combined with many others, leading to very long command-line.
  • You do no want to get stuck only using arrows all the time.

Exercise

  • Type the following:
that was then, this is now

Try navigating this line using the following keybindings:

  • ctrl + a: go to beginning of line
  • ctrl + e: go to end of line
  • ctrl + w: delete current word backwards, word behind if in space
  • ctrl + /: undo changes
  • ctrl + x, <BACKSPACE>: search backwards for a character, here?
  • ctrl + r, ?: search backwards for a character, here?
  • alt + b: move backwards word-wise.
  • alt + f: move backwards word-wise
  • alt + d: delete current word forwards, next word if in space
  • ctrl + k: delete to end of line


Listing files and directories

  • ls is the most common command of all. It lists files and folders in your current location.
  • By default it requires no arguments and will list in alphabetical order
  • Although it has options, using it with wild card (especially the asterisk) can help control it

Some practice

  • List all the files in the directory hdi2u_files. that start with the letters tes
ls tes*
  • List all the files in your directory that start with tes, and end in 1.embl, 2.embl or 3.embl
ls tes*[123].embl

Questions

  • Prefix the asterisk wildcard with a dot, what do you get?
  • Try the -l option, what type to listing are you getting?
  • What does it tell you about the whatsinaname.fasta?
  • Specify the directory to ls, did something unexpected happen?
  • ls has what should be called a companion command: pwd. Try it.
  • If ls is the most commmon command, what might the most common typo be?

Answers

  • You get the hidden files
  • the -l is the long listing
  • this file is empty, it has no content.
  • by default ls will list files anywhere it can find them.
  • this is the Present Working Directory, it tell us where all these files are.
  • sl is the most common typo, so it became a trick program, try it and see

Exercise: Focus only on directories

  • Try postfixing the asterisk with a / and also give the -d option
ls -d */
  • Once we find a file, and know its size and date, the poorly named file command will give use more details. Try
file *
  • It's a lot of information, can it be explained?
  • Another useful companion command to ls is wc -l, word counti iwht the line option. Try:
ls -l tes*[123].embl
wc -l tes*[123].embl
  • What extra information are you getting?

Learnings

  • We've learnt about the most popular command ls
  • Two other companion commands were useful pwd and file
  • A critical aspect is that you do not get a single answer from ls, you get a list of multiple items. This will set the tone for much command-line work.

Learning about Linux commands

  • This is a continuous and very important activity
  • Linux has a large and comprehensive documentation system called man
  • Linux manual pages are referred to as man pages.
  • To open the man page for a particular command, you just need to type man followed by the name of the command you are interested in.
  • To browse through a man page, use the up,down,pgup and pgn keys.
  • To close (quit) the man page simply hit the q key on your keyboard.
  • If you do not know the specific name of a command to use for a particular job, you can search using man –k <roughidea>

followed by the type of thing you are trying to do. An example of this is in exercise 1-3, part c).

Exercise

  • Look up the manual information for the ls command by typing the following in a terminal:
man echo
man mkdir
  • What can you say about this command?

Now try:

man ls
  • Did you think you knew this command already?
  • What does the -h option do? What about the -a option? What would running ls -lrt do?
  • Press the q key when you want to quit reading the man page.
  • Try running ls using some of the options mentioned above.
  • Look up some programs with man pages with the keywords "list directory"
man –k "list directory"

Basic Linux tips for filenames

  • Linux does not deal well with spaces in filenames
  • Expect problems when transferring files from Windows.
  • Everything is case sensitive
  • In genomics its common to use underscores and add useful (meta) information.
  • However this can make the filenames quite long.
  • To reference filenames with spaces in them, you need to enclose the entire filename in quotation marks so that Linux understands that the space is part of one single name.
  • Alternatively, you can "escape" the space using a backslash. For example, if I have a file called my document Linux will see this as two words, "my" and "document".

But you could write either of the following to make it understand you mean a single file:

"my document"
my\ document

Linux shortcut symbols

  • We've seen * is a special symbol, and how useful it can be.
  • There are more however:
  • ? matches a single character
ls multiseqs??.blastx
  • . the directory you are currently in, useful for launching directory-held programs
  • .. the directory one level above the one you are currently in, aka. the parent directory
  • ~ shorthand for your home directory, where all your data is kept.
  • $VAR the dollar sign indicates a variable substitution. You set it with
VAR="that was then, this is now"
  • These variables are often used to provide an so-called "environment" for programs to run.
  • ; separates two commands on the same line
  • > directs output of one command into a file
  • | often called the pipe operator: directs output of one command into another command.

Changing directories

  • Directories are the same as folders. In linux "directory" is more common.
  • Directories within directories are called subdirectories"
  • The command used to change directories is cd
  • Think of your directory structure, (i.e. this set of nested file folders you are in), as a tree structure
  • The simplest directory change you can do is move into a directory directly above or below the one you are in.

Exercises

  • To change directory to the one above your are in, use the shortcut symbol learnt above
cd ..
  • To returns to the last directory you were working in before this one.
cd –

To change to a directory one below you are in, just use the cd command followed by the subdirectory name:

cd subdir_name
  • If you need to change directory without worrying where you are now, you could explicitly state the full or absolute path:
cd /usr/local/bin
  • If you wish to return to your home directory at any time, just type cd by itself.
cd
  • Type
cd utr
  • Type it again. Why doesn't this work?
  • Change directory into the /usr/bin directory by typing
cd /usr/bin
  • List the files in this directory. This is the main directory of runnable programs on the system.
  • How can you get back to your home directory from here?

Tab completion

  • Fear of the command line often means, fear of typing too much. Everybody fears this, so there are tools.
  • Tab completion is probably the most important tool, and relies on your pressing the TAB key
  • It tries to complete the filename or program name you have started typing, saving you typing time and reducing spelling errors.
cd kir
  • followed by TAB. If there is only one directory with a name starting with the letters "kir", the rest of the name will be completed for you.
  • if there are several options, you need to supply more "hint-letters" and press TAB multiple times.

Exercises

  • Type ls testseq and use tab completion.
  • This will show you a list of files that start with testseq.
  • You now have the option of completing the filename yourself, or "tabbing" through the filenames available.
  • It limits itself to files in your current directory.
  • What happens if you type TAB immediately after ls?
  • How can you find out all commands available on the system?

Answer

  • The first word of a command line is usually a command, so TAB looks for commands, not files.
  • By giving no hints at all to TAB, it will look for all possible commands.

Command history

  • This is another very handy tool for saving typing
  • Previous commands you have used are stored in your "history".
  • You use the up and down arrow keys to travel through all the command you can used previously.
  • The command itself, historywill return a list of the last 15 commands run.
  • Going back sequentially can be a bit tedious, ctrl+r will accept hints from you and try to find the past comman that most ressembles your hints.

Exercise

  • Try
history -3
ctrl+r kir
  • Are you seeing what you are expecting?

Keybindings for using the history file

These commands are run blind. They refer to the command you last ran, which most of the time is visible in the line above.

  • :<RET>: save command in history, do not execute.
  • !$<RET>: the final argument of the last command
  • !!<RET>: the entire last command
  • !:1-$<RET>: everthing except the first word of the last command
  • !$<RET>: the final argument of the last command
  • ^then^now<RET>: replace the first occurence of then in last command with now
  • !!:gs/then/now<RET>: replace the ALL occurences of "then" in last command with "now"

Keybindings for using the history file

These commands are run blind. They refer to the command you last ran, which most of the time is visible in the line above.

  • :<RET>: save command in history, do not execute.
  • !$<RET>: the final argument of the last command
  • !!<RET>: the entire last command
  • !:1-$<RET>: everthing except the first word of the last command
  • !$<RET>: the final argument of the last command
  • ^then^now<RET>: replace the first occurence of then in last command with now
  • !!:gs/then/now<RET>: replace the ALL occurences of "then" in last command with "now"

Reading text files

  • These are useful when you want to look at the contents of a file, but not edit it.
  • Among the most common of these commands are cat, more, and less.
  • cat simply prints out one or several files out.
  • Useful for small files, but also for concatenating (which is where it got its name)
  • more and less are pagers
  • You can feed output to them via the pipe operator
  • they have keybindings similar to the editor vi: /, ?, gg. G.

Exercises

  • Move into the hdi2u_files directory.
  • Read the file hsy14768.embl using the commands cat, more and less.
  • Don’t forget that tab completion can save you typing effort.
cat hsy14768.embl
more hsy14768.embl
less hsy14768.embl
  • Use the spacebar to scroll down
  • Press q to quit.
  • Use the spacebar to scroll down, b to go up a page, and the up and down arrow keys to move up and down the file line by line.
  • Press the / key and search for the letters sequen in the file.
  • Press the ? key and search for the letters gene in the file.
  • Press the n key to search for other instances of gene in the file.

Remember the man pages

There are many command line options available for each of the above commands, as well as functionality we do not cover here. To read more about them, consult the manual pages:

man cat
man less

An important note on line endings – CR and LF

  • Besides spaces in filename, there is another major pitfalls when transferring file over to Linux.
  • In Linux, the end of line is called a new line, symbol with \n.
  • Windows uses CR and LF .. called the DOS format.
  • Old Macs used LF
  • the tools dos2unix and mac2unix convert

Text editing

  • There are very many text editors, but one of the most powerful and dependable is vi.
  • It has a steep but tiny learning curve, which we hope to conquer in this section.
  • "vi" is an old version and is available on all Unix/linux systems by default
  • Vim is the modern version, it used ncurses to use the whole screen
  • It's free and has graphical version call gvim and a Windows version too.

Using vim

  • type vim to get in, and :q to get out without saving.
  • ZZ to save onto to current filename. ":sav fname" otherwise
  • It opens in "normal" mode which is similar to less, in that direct editing is not expected.
  • This is changed by pressing i. To get back to normal mode, press the ESC key.
  • In normal mode u undoes any changes
  • : while in normal mode allows command sto be entered
  • Visual: enabled by "v" or "V" (visual block), sub-box at the bottom open.
  • After v or V, movement keys ":%" will operate on whole document, ":’a,’b" operate between two marks, ":42,45" between two line numbers
  • search via "/", ":set hlsearch" to see all the hits

Getting by in only normal mode

  • movement keys, "w" jump via start of words; "e" jump via ends; "fc" jump to next c
  • "0" for start of line, "A" for end of line and into insert mode
  • "x" delete current character, "xp" switch positions of current and next character
  • "yyp" copy current line and paste it underneath
  • "dd" delete line, "2d" delete this and following two lines. "dgg" delete to start, "dG" delete to end
  • "dw" delete current word
  • Command: Activated by ":", sub-box at the bottom open, rich command language
  • Visual: Activated by "v" or "V" (visual block), sub-box at the bottom open, rich command language

Advanced but really useful commands

  • ":colorscheme desert" chang to the desert colour scheme, "morning" "delek" many others
  • :%s/\(sn\)oo\(ze\)/\1ee\2/gc change all instance of snooze to sneeze
  • :%s/snooze/sneeze/gc also works
  • :g/sneeze/d delete all lines without sneeze"
  • :v/sneeze/d delete all lines with sneeze"
  • ":42y[RET]p" paste line 42
  • "d214G" delete to line 214
  • "y214G" delete to line 214

Exercises

  • Use vim to open multiseqs_1.blastx
vim multiseqs_1.blastx
  • Type :set list, what extra are you seeing?
  • Type :set nu what extra are you seeing?
  • Create a file-listing by:
ls * > my.list
  • use v/multiple/d to delete verything that doesn't say "multiple"

Copying files and directories

  • The basic command used to copy files using the command line is cp.
  • At a minimum, you must specify two arguments: the name of the file to be copied, and where you wish to copy the file to.
  • The main things to know about using the cp command are:
  • if you provide the name of an existing directory as the second argument, the file named in the first argument will be copied into that directory.
  • otherwise, it will be assumed that the second argument is another name for the first file, a clone so to speak.
  • if you provide more than two arguments to cp, the final argument needs to be the name of a directory
  • This command it not harmless, if you choose a new name that happens to already be a file, that file will be overwritten.

Exercises

cp unknown.fasta my_new_file.fasta - clones unknown.fasta with the new name my_new_file.fasta
cp unknown.fasta my_new_directory - probably not what you wanted! It just makes another file. ==
mkdir an_actual_directory
cp unknown.fasta an_actual_directory - copy unknown.fasta into an_actual_directory you just made
cp *.embl an_actual_directory - copy all the .embl files into the new directory in one go
  • To copy whole directories, with all the subfiles and subdirectories, use the –R option, (meaning recursive).
cp –R an_actual_directory foo
cp –R ../blastdb .

Linking to files

  • copying big files can exhaust hard disk quickly.
  • You can instead create a link to it with the ln -s command.
ln -s current.file linktocurrent.file

Exercises

  • Try creating a link to multiple.fasta
  • Run ls -l on it
  • Run ls -lH on it. what do you think is happening?

Removing files and directories

  • The key difference between deleting something from the command line and using the graphical file browser is that in the first case the file vanishes immediately, but in the second it will be stored for a while in the Rubbish Bin and can be retrieved.
  • So these can be very destructiv commands, and they should be used carefully and not in a rush.
  • To remove a file or files, use the rm command followed by the name of the file(s) you wish to delete.
rm file1
rm file2 file3 file4
rm foo/*
  • Removing directories cna be done with rmdir, but this is a conservative comamnd as it will refuse to delete if the directory has any files.
rmdir thisdir
  • A much more powerful command is
rm –r fulldir
  • This will wipe out the directory empty or not.
  • With this command, you need to be 100% confident that you will never make a mistake

Exercises

  • Move into the testdir directory.
  • Delete mythirdfile.txt using the command line
  • Delete myfourthfile.txt using the graphical file browser. Is the files now sitting in the Rubbish Bin?
  • Back on the command line, move back into your Home directory.
  • Then delete myfirstfile.txt from testdir without moving back to the testdir directory.

Piping output between applications

  • |, often called an operator because it's so powerful
  • not always easy to find on the keyboard

Exercise

  • "Pipe" the output of "ls" into wc -l to see how many files you have in your output.
ls | wc -l

Grep

  • grep stands for "global regular expression print"
  • you use this command to search for text patterns in a file, for example, Linux's mini-dicitonary.
grep "adge" /usr/share/dict/words
  • regular expressions are different and more powerful than wildcard characters
  • made of special symbols which designate type of characters.
  • grep requires a regular expression pattern as a parameter, and prints all the lines in a file containing that pattern.
  • grep is especially useful in combination with pipes as you can filter the results of other commands.
  • For example, perhaps you only want to see only the information in an EMBL file relating to the origin of the

sequence, that is, the DE line?

Exercise

  • While in the hdi2u_files directory, type the command:
grep "DE" hsy14768.embl

What is this command doing?

  • Try the command:
grep "^DE" hsy14768.embl and grep -x "DE.*" hsy14768.embl
  • What are the ^ symbol and the -x parameter in these commands doing?
  • You will need to check the manpage for grep to be sure.
  • Move to your home directory and type ls –lR
  • Use the above command with a pipe and a grep command to search for files created or modified today.
  • List the files in the hdi2u_files directory and use the grep command to look for those containing the
cat *seqs.fasta | grep "^>" | wc -l
  • Each sequence in a fasta file starts with a header line that begins with a >.
  • request for redirection of output to a file, rather than as a character to look for.
  • As before, the ^ symbol means "match only at the beginning of the line".

The output of this grep search is sent to the wc command, with the -l indicating that you want to know the number of lines – ie. the number of headers and by implication the number of sequences. So a synopsis of the command above is: Read through all files with names ending seqs.fasta and look for all the header lines in the combined output, then count up those lines that matched and return the number to screen. We cover sequence formats later on in part 2 of the tutorial. 28

Environment Variables

  • We have seen that the way commands run can be modified by the options passed on the command line.
  • One of the most important variables is PATH. Try
echo $PATH
  • This critical environment variable shows all the locations where commands that you can use are found.
  • If you want more commands, you need to add their locations to this environment variable

Processes

  • Sometimes a command or program you run in the terminal goes on too long, or is obviously doing something

you did not plan.

  • If there is no obvious way (such as a menu option or button) to stop the program running, try using Ctrl-c
  • A command can include the output of another command by generating a process via the `` or $() operators.

Questions

  • Try these three commands:
ls
echo `ls`
echo $(ls)
  • The results are the same but the method is different. Does it matter?

Accessing a running program or working with others interactively

  • If you just run a job and then close down the terminal you ran it from, normally the job will be terminated.
  • It would be nice to be able to leave a long job running and be able to log out and then log back in again to see how it is progressing.
  • though there are some simple ways to do this, working with the screen program is the real answer.
  • It is a poorly named and somewhat invisible program, but which has somethat can be very useful
  • Serves to offer multiple command-lines instead of just one
  • Similar in this way to tab in a webrowser

Exercises

Type:

screen

Then:

  • ctrl+l, n: cycle through screen windows
  • ctrl+l, :hardcopy RET: create a file with a copy of all inputs and outputs of your session.
  • ctrl+l, d: detach screen session
  • type screen -r to recover a detached
  • type exit to get out of your screen sessions