|
|
Line 1: |
Line 1: |
− | = The Command-line (shell) =
| |
− |
| |
− | The real power of Linux/Unix systems is the command line.
| |
− |
| |
− | * Many programs and facilities are available through graphical options on Linux, but all programs and facilities can be accessed by the command line, also known as the shell.
| |
− | * Graphical interfaces are good for reduced data, when analysing processed data.
| |
− | * web services and curses-mode screens are halfway between command-line and grapihical interface.
| |
− | * However for "heavy-lifting", the command-line is much more convenient
| |
− | * Obvious examples include when you need to work with large numbers of files or want to automate processes.
| |
− | * It's common to talk about "fear of the commandline", our aim is to reduce this.
| |
− |
| |
− | = Anatomy of a Command =
| |
− |
| |
− | <command> <options/parameters> <arguments>
| |
− |
| |
− | * <command> what do I want to do?
| |
− | * <options/parameters> how do I want to do it?
| |
− | * <arguments>, on what do I want to do it?
| |
− |
| |
− | * first word you supply on the command line is interpreted by the system as a command, an operation.
| |
− | * Items that appear after that on on the same line are separated by spaces.
| |
− | * Most commands have options available that will alter the way the command functions.
| |
− | * after the options we have what are called arguments, often these are input files.
| |
− | * With some commands you don't need to issue any parameters or arguments. This is because you are using the default settings.
| |
− | * To know the default settings the documentation must be read.
| |
− |
| |
− | * If a command runs successfully, it often will not report anything back to you
| |
− | * You can of course tell by the nature of the putput files it produced.
| |
− | * If a command is unsuccessful, it will report an error. Most of the time, these are informative, even if a bit cryptic.
| |
− | * However, if you forgot to specify the input file, you should be able to interpret that.
| |
− |
| |
− | = Navigating the command line =
| |
− |
| |
− | * Bioinformatics tools can have very many options
| |
− | * They also can be combined with many others, leading to very long command-line.
| |
− | * You do no want to get stuck only using arrows all the time.
| |
− |
| |
− | = Exercise =
| |
− |
| |
− | * Type the following:
| |
− | that was then, this is now
| |
− |
| |
| = Command-line Navigation = | | = Command-line Navigation = |
| * <code>ctrl + a</code>: go to beginning of line | | * <code>ctrl + a</code>: go to beginning of line |
Line 53: |
Line 11: |
| * <code>ctrl + k</code>: delete to end of line | | * <code>ctrl + k</code>: delete to end of line |
| | | |
− | | + | = Pagers: <code>man</code>, <code>less</code> and <code>vim</code> keys = |
− | = Listing files and directories = | |
− | | |
− | * <code>ls</code> is the most common command of all. It lists files and folders in your current location.
| |
− | * By default it requires no arguments and will list in alphabetical order
| |
− | * Although it has options, using it with wild card (especially the asterisk) can help control it
| |
− | | |
− | == Some practice ==
| |
− | * List all the files in the directory hdi2u_files. that start with the letters tes
| |
− | ls tes*
| |
− | | |
− | * List all the files in your directory that start with tes, and end in 1.embl, 2.embl or 3.embl
| |
− | ls tes*[123].embl
| |
− | | |
− | == Questions ==
| |
− | | |
− | * Prefix the asterisk wildcard with a dot, what do you get?
| |
− | * Try the -l option, what type to listing are you getting?
| |
− | * What does it tell you about the whatsinaname.fasta?
| |
− | * Specify the directory to ls, did something unexpected happen?
| |
− | * <code>ls</code> has what should be called a companion command: <code>pwd</code>. Try it.
| |
− | * If <code>ls</code> is the most commmon command, what might the most common typo be?
| |
− | | |
− | == Answers ==
| |
− | | |
− | * You get the hidden files
| |
− | * the <code>-l</code> is the long listing
| |
− | * this file is empty, it has no content.
| |
− | * by default ls will list files anywhere it can find them.
| |
− | * this is the Present Working Directory, it tell us where all these files are.
| |
− | * <code>sl</code> is the most common typo, so it became a trick program, try it and see
| |
− | | |
− | == Exercise: Focus only on directories ==
| |
− | | |
− | * Try postfixing the asterisk with a <code>/</code> and also give the <code>-d</code> option
| |
− | ls -d */
| |
− | * Once we find a file, and know its size and date, the poorly named <code>file</code> command will give use more details. Try
| |
− | file *
| |
− | * It's a lot of information, can it be explained?
| |
− | * Another useful companion command to <code>ls</code> is <code>wc -l</code>, word counti iwht the line option. Try:
| |
− | ls -l tes*[123].embl
| |
− | wc -l tes*[123].embl
| |
− | * What extra information are you getting?
| |
− | | |
− | == Learnings ==
| |
− | | |
− | * We've learnt about the most popular command <code>ls</code>
| |
− | * Two other companion commands were useful <code>pwd</code> and <code>file</code>
| |
− | * A critical aspect is that you do not get a single answer from <code>ls</code>, you get a list of multiple items. This will set the tone for much command-line work.
| |
− | | |
− | = Learning about Linux commands =
| |
| | | |
| * This is a continuous and very important activity | | * This is a continuous and very important activity |
Line 111: |
Line 19: |
| * To browse through a man page, use the up,down,pgup and pgn keys. | | * To browse through a man page, use the up,down,pgup and pgn keys. |
| * To close (quit) the man page simply hit the q key on your keyboard. | | * To close (quit) the man page simply hit the q key on your keyboard. |
− | * If you do not know the specific name of a command to use for a particular job, you can search using <code>man –k <roughidea></code>
| |
| followed by the type of thing you are trying to do. An example of this is in exercise 1-3, part c). | | followed by the type of thing you are trying to do. An example of this is in exercise 1-3, part c). |
| | | |
Line 132: |
Line 39: |
| * Look up some programs with man pages with the keywords "list directory" | | * Look up some programs with man pages with the keywords "list directory" |
| man –k "list directory" | | man –k "list directory" |
− |
| |
− | = Basic Linux tips for filenames =
| |
− |
| |
− | * Linux does not deal well with spaces in filenames
| |
− | * Expect problems when transferring files from Windows.
| |
− | * Everything is case sensitive
| |
− | * In genomics its common to use underscores and add useful (meta) information.
| |
− | * However this can make the filenames quite long.
| |
− |
| |
− | * To reference filenames with spaces in them, you need to enclose the entire filename in quotation marks so that Linux understands that the space is part of one single name.
| |
− | * Alternatively, you can "escape" the space using a backslash. For example, if I have a file called my document Linux will see this as two words, "my" and "document".
| |
− |
| |
− | But you could write either of the following to make it understand you mean a single file:
| |
− | "my document"
| |
− | my\ document
| |
| | | |
| = Linux shortcut symbols = | | = Linux shortcut symbols = |
Line 158: |
Line 50: |
| * <code>></code> directs output of one command into a file | | * <code>></code> directs output of one command into a file |
| * <code>|</code> often called the pipe operator: directs output of one command into another command. | | * <code>|</code> often called the pipe operator: directs output of one command into another command. |
− |
| |
− | == Exercises ==
| |
− | To change to a directory one below you are in, just use the cd command followed by the subdirectory name:
| |
− | cd subdir_name
| |
− | * If you need to change directory without worrying where you are now, you could explicitly state the full or absolute path:
| |
− | cd /usr/local/bin
| |
− | * If you wish to return to your home directory at any time, just type cd by itself.
| |
− | cd
| |
− | * Type
| |
− | cd utr
| |
− | * Type it again. Why doesn't this work?
| |
− |
| |
− | * Change directory into the /usr/bin directory by typing
| |
− | cd /usr/bin
| |
− |
| |
− | * List the files in this directory. This is the main directory of runnable programs on the system.
| |
− | * How can you get back to your home directory from here?
| |
− |
| |
− | = Tab completion =
| |
− |
| |
− | * Fear of the command line often means, fear of typing too much. Everybody fears this, so there are tools.
| |
− | * Tab completion is probably the most important tool, and relies on your pressing the <code>TAB</code> key
| |
− | * It tries to complete the filename or program name you have started typing, saving you typing time and reducing spelling errors.
| |
− | cd kir
| |
− | * followed by <code>TAB</code>. If there is only one directory with a name starting with the letters "kir", the rest of the name will be completed for you.
| |
− | * if there are several options, you need to supply more "hint-letters" and press <code>TAB</code> multiple times.
| |
− |
| |
− | == Exercises ==
| |
− |
| |
− | * Type <code>ls testseq</code> and use tab completion.
| |
− | * This will show you a list of files that start with testseq.
| |
− | * You now have the option of completing the filename yourself, or "tabbing" through the filenames available.
| |
− | * It limits itself to files in your current directory.
| |
− | * What happens if you type <code>TAB</code> immediately after <code>ls</code>?
| |
− | * How can you find out all commands available on the system?
| |
− |
| |
− | == Answer ==
| |
− | * The first word of a command line is usually a command, so TAB looks for commands, not files.
| |
− | * By giving no hints at all to TAB, it will look for all possible commands.
| |
− |
| |
− | = Command history =
| |
− |
| |
− | * This is another very handy tool for saving typing
| |
− | * Previous commands you have used are stored in your "history".
| |
− | * You use the up and down arrow keys to travel through all the command you can used previously.
| |
− | * The command itself, <code>history</code>will return a list of the last 15 commands run.
| |
− | * Going back sequentially can be a bit tedious, <code>ctrl+r</code> will accept hints from you and try to find the past comman that most ressembles your hints.
| |
− |
| |
− | == Exercise ==
| |
− | * Try
| |
− | history -3
| |
− | ctrl+r kir
| |
− |
| |
− | * Are you seeing what you are expecting?
| |
| | | |
| == Keybindings for using the history file == | | == Keybindings for using the history file == |
− |
| |
− | These commands are run blind. They refer to the command you last ran,
| |
− | which most of the time is visible in the line above.
| |
| | | |
| * <code>:<RET></code>: save command in history, do not execute. | | * <code>:<RET></code>: save command in history, do not execute. |
Line 224: |
Line 59: |
| * <code>!$<RET></code>: the final argument of the last command | | * <code>!$<RET></code>: the final argument of the last command |
| * <code>^then^now<RET></code>: replace the first occurence of then in last command with now | | * <code>^then^now<RET></code>: replace the first occurence of then in last command with now |
− | * <code>!!:gs/then/now<RET></code>: replace the ALL occurences of "then" in last command with "now" | + | * <code>!!:gs/then/now/<RET></code>: replace the ALL occurences of "then" in last command with "now" |
− | | + | * <code>!!:gs/then/now/:p<RET></code>: as above except do not execute. |
− | = Keybindings for using the history file =
| |
− | | |
− | These commands are run blind. They refer to the command you last ran,
| |
− | which most of the time is visible in the line above.
| |
− | | |
− | * <code>:<RET></code>: save command in history, do not execute.
| |
− | * <code>!$<RET></code>: the final argument of the last command
| |
− | * <code>!!<RET></code>: the entire last command
| |
− | * <code>!:1-$<RET></code>: everthing except the first word of the last command
| |
− | * <code>!$<RET></code>: the final argument of the last command
| |
− | * <code>^then^now<RET></code>: replace the first occurence of then in last command with now
| |
− | * <code>!!:gs/then/now<RET></code>: replace the ALL occurences of "then" in last command with "now" | |
− | | |
− | = Reading text files =
| |
− | | |
− | * These are useful when you want to look at the contents of a file, but not edit it.
| |
− | * Among the most common of these commands are <code>cat</code>, <code>more</code>, and <code>less</code>.
| |
− | | |
− | * <code>cat</code> simply prints out one or several files out.
| |
− | * Useful for small files, but also for concatenating (which is where it got its name)
| |
− | * <code>more</code> and <code>less</code> are pagers
| |
− | * You can feed output to them via the pipe operator
| |
− | * they have keybindings similar to the editor vi: <code>/</code>, <code>?</code>, <code>gg</code>. <code>G</code>.
| |
− | | |
− | == Exercises ==
| |
− | * Move into the hdi2u_files directory.
| |
− | * Read the file hsy14768.embl using the commands cat, more and less.
| |
− | * Don’t forget that tab completion can save you typing effort.
| |
− | cat hsy14768.embl
| |
− | more hsy14768.embl
| |
− | less hsy14768.embl
| |
− | | |
− | * Use the spacebar to scroll down
| |
− | * Press q to quit.
| |
− | * Use the spacebar to scroll down, b to go up a page, and the up and down arrow keys to move up and down the file line by line.
| |
− | * Press the / key and search for the letters sequen in the file.
| |
− | * Press the ? key and search for the letters gene in the file.
| |
− | * Press the n key to search for other instances of gene in the file.
| |
− | | |
− | == Remember the man pages ==
| |
− | There are many command line options available for each of the above commands, as well as functionality we do not cover here. To read more about them, consult the manual pages:
| |
− | man cat
| |
− | man less
| |
| | | |
| == An important note on line endings – CR and LF == | | == An important note on line endings – CR and LF == |
Line 289: |
Line 81: |
| == Using vim == | | == Using vim == |
| | | |
− | * type <code>vim</code> to get in, and <code>:q</code> to get out without saving. | + | * type <code>vim</code> to get in, and <code>:q!</code> to get out without saving. |
| * <code>ZZ</code> to save onto to current filename. ":sav fname" otherwise | | * <code>ZZ</code> to save onto to current filename. ":sav fname" otherwise |
| * It opens in "normal" mode which is similar to less, in that direct editing is not expected. | | * It opens in "normal" mode which is similar to less, in that direct editing is not expected. |
Line 313: |
Line 105: |
| | | |
| * ":colorscheme desert" chang to the desert colour scheme, "morning" "delek" many others | | * ":colorscheme desert" chang to the desert colour scheme, "morning" "delek" many others |
− | * <code>:%s/\(sn\)oo\(ze\)/\1ee\2/gc</code> change all instance of snooze to sneeze
| |
| * <code>:%s/snooze/sneeze/gc</code> also works | | * <code>:%s/snooze/sneeze/gc</code> also works |
| * <code>:g/sneeze/d delete all lines without sneeze"</code> | | * <code>:g/sneeze/d delete all lines without sneeze"</code> |
Line 320: |
Line 111: |
| * "d214G" delete to line 214 | | * "d214G" delete to line 214 |
| * "y214G" delete to line 214 | | * "y214G" delete to line 214 |
| + | * <code>:set list</code>, all non-printing characters are also shown. |
| + | * <code>:set hlsearch</code>, will highlight all search occurences. |
| + | * <code>:set nu</code>, show line numbers. |
| | | |
− | == Exercises ==
| |
− | * Use vim to open multiseqs_1.blastx
| |
− | vim multiseqs_1.blastx
| |
− | * Type <code>:set list</code>, what extra are you seeing?
| |
− | * Type <code>:set nu</code> what extra are you seeing?
| |
− | * Create a file-listing by:
| |
− | ls * > my.list
| |
− | * use <code>v/multiple/d</code> to delete verything that doesn't say "multiple"
| |
− |
| |
− | = Copying files and directories =
| |
− |
| |
− | * The basic command used to copy files using the command line is <code>cp</code>.
| |
− | * At a minimum, you must specify two arguments: the name of the file to be copied, and where you wish to copy the file to.
| |
− | * The main things to know about using the cp command are:
| |
− | * if you provide the name of an existing directory as the second argument, the file named in the first argument will be copied into that directory.
| |
− | * otherwise, it will be assumed that the second argument is another name for the first file, a clone so to speak.
| |
− | * if you provide more than two arguments to cp, the final argument needs to be the name of a directory
| |
− | * This command it not harmless, if you choose a new name that happens to already be a file, that file will be overwritten.
| |
− |
| |
− | == Exercises ==
| |
− |
| |
− | cp unknown.fasta my_new_file.fasta - clones unknown.fasta with the new name my_new_file.fasta
| |
− | cp unknown.fasta my_new_directory - probably not what you wanted! It just makes another file. ==
| |
− | mkdir an_actual_directory
| |
− | cp unknown.fasta an_actual_directory - copy unknown.fasta into an_actual_directory you just made
| |
− | cp *.embl an_actual_directory - copy all the .embl files into the new directory in one go
| |
− |
| |
− | * To copy whole directories, with all the subfiles and subdirectories, use the –R option, (meaning recursive).
| |
− | cp –R an_actual_directory foo
| |
− | cp –R ../blastdb .
| |
− |
| |
− | = Linking to files =
| |
− |
| |
− | * copying big files can exhaust hard disk quickly.
| |
− | * You can instead create a link to it with the <code>ln -s</code> command.
| |
− | ln -s current.file linktocurrent.file
| |
− |
| |
− | == Exercises ==
| |
− | * Try creating a link to <code>multiple.fasta</code>
| |
− | * Run <code>ls -l</code> on it
| |
− | * Run <code>ls -lH</code> on it. what do you think is happening?
| |
− |
| |
− | = Removing files and directories =
| |
− |
| |
− | * The key difference between deleting something from the command line and using the graphical file browser is that in the first case the file vanishes immediately, but in the second it will be stored for a while in the Rubbish Bin and can be retrieved.
| |
− | * So these can be very destructiv commands, and they should be used carefully and not in a rush.
| |
− | * To remove a file or files, use the rm command followed by the name of the file(s) you wish to delete.
| |
− | rm file1
| |
− | rm file2 file3 file4
| |
− | rm foo/*
| |
− |
| |
− | * Removing directories cna be done with rmdir, but this is a conservative comamnd as it will refuse to delete if the directory has any files.
| |
− | rmdir thisdir
| |
− | * A much more powerful command is
| |
− | rm –r fulldir
| |
− | * This will wipe out the directory empty or not.
| |
− |
| |
− | * With this command, you need to be 100% confident that you will never make a mistake
| |
− |
| |
− | == Exercises ==
| |
− |
| |
− | * Move into the testdir directory.
| |
− | * Delete mythirdfile.txt using the command line
| |
− | * Delete myfourthfile.txt using the graphical file browser. Is the files now sitting in the Rubbish Bin?
| |
− | * Back on the command line, move back into your Home directory.
| |
− | * Then delete myfirstfile.txt from testdir without moving back to the testdir directory.
| |
− |
| |
− | = Piping output between applications =
| |
− | * <code>|</code>, often called an operator because it's so powerful
| |
− | * not always easy to find on the keyboard
| |
− |
| |
− | == Exercise ==
| |
− | * "Pipe" the output of "ls" into wc -l to see how many files you have in your output.
| |
− | ls | wc -l
| |
− |
| |
− | = Grep =
| |
− | * grep stands for "global regular expression print"
| |
− | * you use this command to search for text patterns in a file, for example, Linux's mini-dicitonary.
| |
− |
| |
− | grep "adge" /usr/share/dict/words
| |
− |
| |
− | * regular expressions are different and more powerful than wildcard characters
| |
− | * made of special symbols which designate type of characters.
| |
− | * grep requires a regular expression pattern as a parameter, and prints all the lines in a file containing that pattern.
| |
− | * grep is especially useful in combination with pipes as you can filter the results of other commands.
| |
− | * For example, perhaps you only want to see only the information in an EMBL file relating to the origin of the
| |
− | sequence, that is, the DE line?
| |
− |
| |
− | == Exercise ==
| |
− | * While in the hdi2u_files directory, type the command:
| |
− | grep "DE" hsy14768.embl
| |
− | What is this command doing?
| |
− |
| |
− | * Try the command:
| |
− | grep "^DE" hsy14768.embl and grep -x "DE.*" hsy14768.embl
| |
− | * What are the <code>^</code> symbol and the <code>-x</code> parameter in these commands doing?
| |
− | * You will need to check the manpage for grep to be sure.
| |
− |
| |
− | * Move to your home directory and type ls –lR
| |
− | * Use the above command with a pipe and a grep command to search for files created or modified today.
| |
− | * List the files in the hdi2u_files directory and use the grep command to look for those containing the
| |
− |
| |
− | cat *seqs.fasta | grep "^>" | wc -l
| |
− |
| |
− | * Each sequence in a fasta file starts with a header line that begins with a <code>></code>.
| |
− | * request for redirection of output to a file, rather than as a character to look for.
| |
− | * As before, the <code>^ </code>symbol means "match only at the beginning of the line".
| |
− | The output of this grep search is sent to the wc command, with the -l indicating that you want to know the
| |
− | number of lines – ie. the number of headers and by implication the number of sequences.
| |
− | So a synopsis of the command above is: Read through all files with names ending seqs.fasta and look for all
| |
− | the header lines in the combined output, then count up those lines that matched and return the number to
| |
− | screen.
| |
− | We cover sequence formats later on in part 2 of the tutorial.
| |
− | 28
| |
− |
| |
− | = Environment Variables =
| |
− |
| |
− | * We have seen that the way commands run can be modified by the options passed on the command line.
| |
− | * One of the most important variables is PATH. Try
| |
− | echo $PATH
| |
− |
| |
− | * This critical environment variable shows all the locations where commands that you can use are found.
| |
− | * If you want more commands, you need to add their locations to this environment variable
| |
− |
| |
− | = Processes =
| |
− | * Sometimes a command or program you run in the terminal goes on too long, or is obviously doing something
| |
− | you did not plan.
| |
− | * If there is no obvious way (such as a menu option or button) to stop the program running, try using <code>Ctrl-c</code>
| |
− | * A command can include the output of another command by generating a process via the <code>``</code> or <code>$()</code> operators.
| |
− |
| |
− | == Questions ==
| |
− | * Try these three commands:
| |
− | ls
| |
− | echo `ls`
| |
− | echo $(ls)
| |
− | * The results are the same but the method is different. Does it matter?
| |
− |
| |
− | = Accessing a running program or working with others interactively =
| |
− |
| |
− | * If you just run a job and then close down the terminal you ran it from, normally the job will be terminated.
| |
− | * It would be nice to be able to leave a long job running and be able to log out and then log back in again to see how it is progressing.
| |
− | * though there are some simple ways to do this, working with the screen program is the real answer.
| |
− | * It is a poorly named and somewhat invisible program, but which has somethat can be very useful
| |
− | * Serves to offer multiple command-lines instead of just one
| |
− | * Similar in this way to tab in a webrowser
| |
| | | |
| == Exercises == | | == Exercises == |