Hdi2u commandbased exercises
Contents
- 1 The Command-line (shell)
- 2 Anatomy of a Command
- 3 Navigating the command line
- 4 Exercise
- 5 Listing files and directories
- 6 Learning about Linux commands
- 7 Basic Linux tips for filenames
- 8 Linux shortcut symbols
- 9 Changing directories
- 10 Tab completion
- 11 Command history
- 12 Keybindings for using the history file
- 13 Reading text files
- 14 Text editing
- 15 Copying files and directories
- 16 Linking to files
- 17 Removing files and directories
- 18 Piping output between applications
- 19 Grep
- 20 Environment Variables
- 21 Processes
- 22 Accessing a running program or working with others interactively
The Command-line (shell)
The real power of Linux/Unix systems is the command line.
- Many programs and facilities are available through graphical options on Linux, but all programs and facilities can be accessed by the command line, also known as the shell.
- Graphical interfaces are good for reduced data, when analysing processed data.
- web services and curses-mode screens are halfway between command-line and grapihical interface.
- However for "heavy-lifting", the command-line is much more convenient
- Obvious examples include when you need to work with large numbers of files or want to automate processes.
- It's common to talk about "fear of the commandline", our aim is to reduce this.
Anatomy of a Command
<command> <options/parameters> <arguments>
- <command> what do I want to do?
- <options/parameters> how do I want to do it?
- <arguments>, on what do I want to do it?
- first word you supply on the command line is interpreted by the system as a command, an operation.
- Items that appear after that on on the same line are separated by spaces.
- Most commands have options available that will alter the way the command functions.
- after the options we have what are called arguments, often these are input files.
- With some commands you don't need to issue any parameters or arguments. This is because you are using the default settings.
- To know the default settings the documentation must be read.
- If a command runs successfully, it often will not report anything back to you
- You can of course tell by the nature of the putput files it produced.
- If a command is unsuccessful, it will report an error. Most of the time, these are informative, even if a bit cryptic.
- However, if you forgot to specify the input file, you should be able to interpret that.
- Bioinformatics tools can have very many options
- They also can be combined with many others, leading to very long command-line.
- You do no want to get stuck only using arrows all the time.
Exercise
- Type the following:
that was then, this is now
Try navigating this line using the following keybindings:
-
ctrl + a
: go to beginning of line -
ctrl + e
: go to end of line -
ctrl + w
: delete current word backwards, word behind if in space -
ctrl + /
: undo changes -
ctrl + x, <BACKSPACE>
: search backwards for a character, here? -
ctrl + r, ?
: search backwards for a character, here? -
alt + b
: move backwards word-wise. -
alt + f
: move backwards word-wise -
alt + d
: delete current word forwards, next word if in space -
ctrl + k
: delete to end of line
Listing files and directories
-
ls
is the most common command of all. It lists files and folders in your current location. - By default it requires no arguments and will list in alphabetical order
- Although it has options, using it with wild card (especially the asterisk) can help control it
Some practice
- List all the files in the directory hdi2u_files. that start with the letters tes
ls tes*
- List all the files in your directory that start with tes, and end in 1.embl, 2.embl or 3.embl
ls tes*[123].embl
Questions
- Prefix the asterisk wildcard with a dot, what do you get?
- Try the -l option, what type to listing are you getting?
- What does it tell you about the whatsinaname.fasta?
- Specify the directory to ls, did something unexpected happen?
-
ls
has what should be called a companion command:pwd
. Try it. - If
ls
is the most commmon command, what might the most common typo be?
Answers
- You get the hidden files
- the
-l
is the long listing - this file is empty, it has no content.
- by default ls will list files anywhere it can find them.
- this is the Present Working Directory, it tell us where all these files are.
-
sl
is the most common typo, so it became a trick program, try it and see
Exercise: Focus only on directories
- Try postfixing the asterisk with a
/
and also give the-d
option
ls -d */
- Once we find a file, and know its size and date, the poorly named
file
command will give use more details. Try
file *
- It's a lot of information, can it be explained?
- Another useful companion command to
ls
iswc -l
, word counti iwht the line option. Try:
ls -l tes*[123].embl wc -l tes*[123].embl
- What extra information are you getting?
Learnings
- We've learnt about the most popular command
ls
- Two other companion commands were useful
pwd
andfile
- A critical aspect is that you do not get a single answer from
ls
, you get a list of multiple items. This will set the tone for much command-line work.
Learning about Linux commands
- This is a continuous and very important activity
- Linux has a large and comprehensive documentation system called
man
- Linux manual pages are referred to as man pages.
- To open the man page for a particular command, you just need to type man followed by the name of the command you are interested in.
- To browse through a man page, use the up,down,pgup and pgn keys.
- To close (quit) the man page simply hit the q key on your keyboard.
- If you do not know the specific name of a command to use for a particular job, you can search using
man –k <roughidea>
followed by the type of thing you are trying to do. An example of this is in exercise 1-3, part c).
Exercise
- Look up the manual information for the ls command by typing the following in a terminal:
man echo man mkdir
- What can you say about this command?
Now try:
man ls
- Did you think you knew this command already?
- What does the
-h
option do? What about the-a
option? What would runningls -lrt
do? - Press the
q
key when you want to quit reading the man page. - Try running
ls
using some of the options mentioned above.
- Look up some programs with man pages with the keywords "list directory"
man –k "list directory"
Basic Linux tips for filenames
- Linux does not deal well with spaces in filenames
- Expect problems when transferring files from Windows.
- Everything is case sensitive
- In genomics its common to use underscores and add useful (meta) information.
- However this can make the filenames quite long.
- To reference filenames with spaces in them, you need to enclose the entire filename in quotation marks so that Linux understands that the space is part of one single name.
- Alternatively, you can "escape" the space using a backslash. For example, if I have a file called my document Linux will see this as two words, "my" and "document".
But you could write either of the following to make it understand you mean a single file:
"my document" my\ document
Linux shortcut symbols
- We've seen
*
is a special symbol, and how useful it can be. - There are more however:
-
?
matches a single character
ls multiseqs??.blastx
-
.
the directory you are currently in, useful for launching directory-held programs -
..
the directory one level above the one you are currently in, aka. the parent directory -
~
shorthand for your home directory, where all your data is kept. -
$VAR
the dollar sign indicates a variable substitution. You set it with
VAR="that was then, this is now"
- These variables are often used to provide an so-called "environment" for programs to run.
-
;
separates two commands on the same line -
>
directs output of one command into a file -
|
often called the pipe operator: directs output of one command into another command.
Changing directories
- Directories are the same as folders. In linux "directory" is more common.
- Directories within directories are called subdirectories"
- The command used to change directories is
cd
- Think of your directory structure, (i.e. this set of nested file folders you are in), as a tree structure
- The simplest directory change you can do is move into a directory directly above or below the one you are in.
Exercises
- To change directory to the one above your are in, use the shortcut symbol learnt above
cd ..
- To returns to the last directory you were working in before this one.
cd –
To change to a directory one below you are in, just use the cd command followed by the subdirectory name:
cd subdir_name
- If you need to change directory without worrying where you are now, you could explicitly state the full or absolute path:
cd /usr/local/bin
- If you wish to return to your home directory at any time, just type cd by itself.
cd
- Type
cd utr
- Type it again. Why doesn't this work?
- Change directory into the /usr/bin directory by typing
cd /usr/bin
- List the files in this directory. This is the main directory of runnable programs on the system.
- How can you get back to your home directory from here?
Tab completion
- Fear of the command line often means, fear of typing too much. Everybody fears this, so there are tools.
- Tab completion is probably the most important tool, and relies on your pressing the
TAB
key - It tries to complete the filename or program name you have started typing, saving you typing time and reducing spelling errors.
cd kir
- followed by
TAB
. If there is only one directory with a name starting with the letters "kir", the rest of the name will be completed for you. - if there are several options, you need to supply more "hint-letters" and press
TAB
multiple times.
Exercises
- Type
ls testseq
and use tab completion. - This will show you a list of files that start with testseq.
- You now have the option of completing the filename yourself, or "tabbing" through the filenames available.
- It limits itself to files in your current directory.
- What happens if you type
TAB
immediately afterls
? - How can you find out all commands available on the system?
Answer
- The first word of a command line is usually a command, so TAB looks for commands, not files.
- By giving no hints at all to TAB, it will look for all possible commands.
Command history
- This is another very handy tool for saving typing
- Previous commands you have used are stored in your "history".
- You use the up and down arrow keys to travel through all the command you can used previously.
- The command itself,
history
will return a list of the last 15 commands run. - Going back sequentially can be a bit tedious,
ctrl+r
will accept hints from you and try to find the past comman that most ressembles your hints.
Exercise
- Try
history -3 ctrl+r kir
- Are you seeing what you are expecting?
Keybindings for using the history file
These commands are run blind. They refer to the command you last ran, which most of the time is visible in the line above.
-
:<RET>
: save command in history, do not execute. -
!$<RET>
: the final argument of the last command -
!!<RET>
: the entire last command -
!:1-$<RET>
: everthing except the first word of the last command -
!$<RET>
: the final argument of the last command -
^then^now<RET>
: replace the first occurence of then in last command with now -
!!:gs/then/now<RET>
: replace the ALL occurences of "then" in last command with "now"
Keybindings for using the history file
These commands are run blind. They refer to the command you last ran, which most of the time is visible in the line above.
-
:<RET>
: save command in history, do not execute. -
!$<RET>
: the final argument of the last command -
!!<RET>
: the entire last command -
!:1-$<RET>
: everthing except the first word of the last command -
!$<RET>
: the final argument of the last command -
^then^now<RET>
: replace the first occurence of then in last command with now -
!!:gs/then/now<RET>
: replace the ALL occurences of "then" in last command with "now"
Reading text files
- These are useful when you want to look at the contents of a file, but not edit it.
- Among the most common of these commands are
cat
,more
, andless
.
-
cat
simply prints out one or several files out. - Useful for small files, but also for concatenating (which is where it got its name)
-
more
andless
are pagers - You can feed output to them via the pipe operator
- they have keybindings similar to the editor vi:
/
,?
,gg
.G
.
Exercises
- Move into the hdi2u_files directory.
- Read the file hsy14768.embl using the commands cat, more and less.
- Don’t forget that tab completion can save you typing effort.
cat hsy14768.embl more hsy14768.embl less hsy14768.embl
- Use the spacebar to scroll down
- Press q to quit.
- Use the spacebar to scroll down, b to go up a page, and the up and down arrow keys to move up and down the file line by line.
- Press the / key and search for the letters sequen in the file.
- Press the ? key and search for the letters gene in the file.
- Press the n key to search for other instances of gene in the file.
Remember the man pages
There are many command line options available for each of the above commands, as well as functionality we do not cover here. To read more about them, consult the manual pages:
man cat man less
An important note on line endings – CR and LF
- Besides spaces in filename, there is another major pitfalls when transferring file over to Linux.
- In Linux, the end of line is called a new line, symbol with
\n
. - Windows uses
CR
andLF
.. called theDOS
format. - Old Macs used
LF
- the tools
dos2unix
andmac2unix
convert
Text editing
- There are very many text editors, but one of the most powerful and dependable is
vi
. - It has a steep but tiny learning curve, which we hope to conquer in this section.
- "vi" is an old version and is available on all Unix/linux systems by default
- Vim is the modern version, it used
ncurses
to use the whole screen - It's free and has graphical version call gvim and a Windows version too.
Using vim
- type
vim
to get in, and:q
to get out without saving. -
ZZ
to save onto to current filename. ":sav fname" otherwise - It opens in "normal" mode which is similar to less, in that direct editing is not expected.
- This is changed by pressing
i
. To get back to normal mode, press theESC
key. - In normal mode
u
undoes any changes -
:
while in normal mode allows command sto be entered - Visual: enabled by "v" or "V" (visual block), sub-box at the bottom open.
- After v or V, movement keys ":%" will operate on whole document, ":’a,’b" operate between two marks, ":42,45" between two line numbers
- search via "/", ":set hlsearch" to see all the hits
Getting by in only normal mode
- movement keys, "w" jump via start of words; "e" jump via ends; "fc" jump to next c
- "0" for start of line, "A" for end of line and into insert mode
- "x" delete current character, "xp" switch positions of current and next character
- "yyp" copy current line and paste it underneath
- "dd" delete line, "2d" delete this and following two lines. "dgg" delete to start, "dG" delete to end
- "dw" delete current word
- Command: Activated by ":", sub-box at the bottom open, rich command language
- Visual: Activated by "v" or "V" (visual block), sub-box at the bottom open, rich command language
Advanced but really useful commands
- ":colorscheme desert" chang to the desert colour scheme, "morning" "delek" many others
-
:%s/\(sn\)oo\(ze\)/\1ee\2/gc
change all instance of snooze to sneeze -
:%s/snooze/sneeze/gc
also works -
:g/sneeze/d delete all lines without sneeze"
-
:v/sneeze/d delete all lines with sneeze"
- ":42y[RET]p" paste line 42
- "d214G" delete to line 214
- "y214G" delete to line 214
Exercises
- Use vim to open multiseqs_1.blastx
vim multiseqs_1.blastx
- Type
:set list
, what extra are you seeing? - Type
:set nu
what extra are you seeing? - Create a file-listing by:
ls * > my.list
- use
v/multiple/d
to delete verything that doesn't say "multiple"
Copying files and directories
- The basic command used to copy files using the command line is
cp
. - At a minimum, you must specify two arguments: the name of the file to be copied, and where you wish to copy the file to.
- The main things to know about using the cp command are:
- if you provide the name of an existing directory as the second argument, the file named in the first argument will be copied into that directory.
- otherwise, it will be assumed that the second argument is another name for the first file, a clone so to speak.
- if you provide more than two arguments to cp, the final argument needs to be the name of a directory
- This command it not harmless, if you choose a new name that happens to already be a file, that file will be overwritten.
Exercises
cp unknown.fasta my_new_file.fasta - clones unknown.fasta with the new name my_new_file.fasta cp unknown.fasta my_new_directory - probably not what you wanted! It just makes another file. == mkdir an_actual_directory cp unknown.fasta an_actual_directory - copy unknown.fasta into an_actual_directory you just made cp *.embl an_actual_directory - copy all the .embl files into the new directory in one go
- To copy whole directories, with all the subfiles and subdirectories, use the –R option, (meaning recursive).
cp –R an_actual_directory foo cp –R ../blastdb .
Linking to files
- copying big files can exhaust hard disk quickly.
- You can instead create a link to it with the
ln -s
command.
ln -s current.file linktocurrent.file
Exercises
- Try creating a link to
multiple.fasta
- Run
ls -l
on it - Run
ls -lH
on it. what do you think is happening?
Removing files and directories
- The key difference between deleting something from the command line and using the graphical file browser is that in the first case the file vanishes immediately, but in the second it will be stored for a while in the Rubbish Bin and can be retrieved.
- So these can be very destructiv commands, and they should be used carefully and not in a rush.
- To remove a file or files, use the rm command followed by the name of the file(s) you wish to delete.
rm file1 rm file2 file3 file4 rm foo/*
- Removing directories cna be done with rmdir, but this is a conservative comamnd as it will refuse to delete if the directory has any files.
rmdir thisdir
- A much more powerful command is
rm –r fulldir
- This will wipe out the directory empty or not.
- With this command, you need to be 100% confident that you will never make a mistake
Exercises
- Move into the testdir directory.
- Delete mythirdfile.txt using the command line
- Delete myfourthfile.txt using the graphical file browser. Is the files now sitting in the Rubbish Bin?
- Back on the command line, move back into your Home directory.
- Then delete myfirstfile.txt from testdir without moving back to the testdir directory.
Piping output between applications
-
|
, often called an operator because it's so powerful - not always easy to find on the keyboard
Exercise
- "Pipe" the output of "ls" into wc -l to see how many files you have in your output.
ls | wc -l
Grep
- grep stands for "global regular expression print"
- you use this command to search for text patterns in a file, for example, Linux's mini-dicitonary.
grep "adge" /usr/share/dict/words
- regular expressions are different and more powerful than wildcard characters
- made of special symbols which designate type of characters.
- grep requires a regular expression pattern as a parameter, and prints all the lines in a file containing that pattern.
- grep is especially useful in combination with pipes as you can filter the results of other commands.
- For example, perhaps you only want to see only the information in an EMBL file relating to the origin of the
sequence, that is, the DE line?
Exercise
- While in the hdi2u_files directory, type the command:
grep "DE" hsy14768.embl
What is this command doing?
- Try the command:
grep "^DE" hsy14768.embl and grep -x "DE.*" hsy14768.embl
- What are the
^
symbol and the-x
parameter in these commands doing? - You will need to check the manpage for grep to be sure.
- Move to your home directory and type ls –lR
- Use the above command with a pipe and a grep command to search for files created or modified today.
- List the files in the hdi2u_files directory and use the grep command to look for those containing the
cat *seqs.fasta | grep "^>" | wc -l
- Each sequence in a fasta file starts with a header line that begins with a
>
. - request for redirection of output to a file, rather than as a character to look for.
- As before, the
^
symbol means "match only at the beginning of the line".
The output of this grep search is sent to the wc command, with the -l indicating that you want to know the number of lines – ie. the number of headers and by implication the number of sequences. So a synopsis of the command above is: Read through all files with names ending seqs.fasta and look for all the header lines in the combined output, then count up those lines that matched and return the number to screen. We cover sequence formats later on in part 2 of the tutorial. 28
Environment Variables
- We have seen that the way commands run can be modified by the options passed on the command line.
- One of the most important variables is PATH. Try
echo $PATH
Some commands also read values called environment variables which affect their behaviour. Environmental variables are set within the shell via the export command and are passed to any processes you run. This is useful when you want to set some parameter that is common to all invocations of a command, or applies across several commands. For example, your favourite text editor may be, say, Gedit, or Nano, or Vim, or Emacs. In the shell you can say: export EDITOR=vim Now any command that wants to run a text editor knows what your preferred editor is. Within the shell you can get at the current value of en environment variable by prefixing it with a $ sign, eg. echo $EDITOR
prints the current value of the EDITOR environment variable to the screen
The printenv command dumps all environment variables. Note that environment variables are only set in the current shell and are not saved by default, so if you run a command in another terminal or close and restart the terminal any values you set will be lost. For information on making the settings permanent by editing your .zshrc file see the user guide under Supported Shells.
Exercise 1-16
- Give the command: export VAR1=hello (with no spaces around the = sign) then:
◦ echo $VAR1 ◦ echo $ VAR1 ◦ echo "$VAR1" ◦ echo '$VAR1'
- Start a new terminal window by typing: gnome-terminal &
◦ Within this new terminal: echo $VAR1
- Start a second new terminal by right-clicking the icon in the Dash and selecting New Terminal
◦ Within this new shell: echo $VAR1
- Go back to the original shell window
unset VAR1 echo $VAR1
Has this affected either of the other two shells you started? Check them:
echo $VAR1
Environment variables are inherited when one process starts another, much like genetic material is inherited when a cell divides. Hopefully this explains the behaviour you see in the exercise above. When you start a terminal from en existing shell it inherits the environment from that shell. When you start one from the system menu it inherits just the base system environment. Furthermore, once a program is running no external program can modify its environment variables.
Processes
- Sometimes a command or program you run in the terminal goes on too long, or is obviously doing something
you did not plan.
- If there is no obvious way (such as a menu option or button) to stop the program running, try using
Ctrl-c
- A command can include the output of another command by generating a process via the
``
or$()
operators.
Questions
- Try these three commands:
ls echo `ls` echo $(ls)
- The results are the same but the method is different. Does it matter?
Accessing a running program or working with others interactively
- If you just run a job and then close down the terminal you ran it from, normally the job will be terminated.
- It would be nice to be able to leave a long job running and be able to log out and then log back in again to see how it is progressing.
- though there are some simple ways to do this, working with the screen program is the real answer.
- It is a poorly named and somewhat invisible program, but which has somethat can be very useful
- Serves to offer multiple command-lines instead of just one
- Similar in this way to tab in a webrowser
Exercises
Type:
screen
Then:
-
ctrl+l, n
: cycle through screen windows -
ctrl+l, :hardcopy RET
: create a file with a copy of all inputs and outputs of your session. -
ctrl+l, d
: detach screen session - type
screen -r
to recover a detached - type exit to get out of your screen sessions