Bioinformatics : Terminal Shortcuts

While you are typing in commands into the Terminal, you may find it faster to use shortcuts to navigate. Firstly, you can change system preferences so the cursor moves faster when you arrow over. On MacOSX, go to system preferences - keyboard - keyboard tab and turn the Key Repeat Rate up to fast. This will help save a lot of time. Some general Unix commands that can help enormously are:

Commands that are useful:

Ctrl+a    goes to the start of the command line
Ctrl+e    move to the end of the command line
Ctrl+w    delete the word before the cursor
Ctrl+u    delete from cursor at the start of the command line
Ctrl+k    delete from the cursor to the end of the command line
Ctrl+xx   move between start of command line and current cursor position (and back again)

Option+F+# move to the end of the current or next word on the command line.
Alt+B      Move the cursor to the beginning of the current or previous word
Alt+F     Move the cursor to the end of the next word
Tab    will autocomplete commands and file names (ENORMOUSLY USEFUL!!!)
Control+l    clear command

To find old Terminal commands, you can scroll through your history

Arrow up or Control+p scroll up in the history and edit previously executed commands/press enter

To search through old commands use Control+R then type in letters to search.

Control+z send the current process to the background
type fg to get it back to the foreground

Bioinformatic data sets can be gigabytes or more. There are PERL scripts and Python scripts available that can assist in fast data manipulation, such as pulling certain sequences from a large file. But script writing can take way longer than its worth. You can also use grep and sed commands right in the terminal to pull out information from a large data file. I have attached general manuals on sed and grep commands to this post. Below is an example that a colleague gave me.:

To grab a sequence from a file:

If you are familiar with command line terminal, you could try grep in the command line:

grep "yoursequence" yourfastafile > Users/name/Desktop/outputfile.txt

Using the '-B 1' option will allow you to grab the header for this sequence too:

grep -B 5 "yoursequence" yourfastafile > myfinds.txt

if you have a text file with each sequence on each line you can use fgrep:

fgrep -B 1 -f filewithlistofsequences yourfastafile

The 'grep' command can also be followed by a -A -B or -C option and the number of lines you want to grab. -A will print the number of lines trailing your search query in parentheses after the matching lines. -B will print the number of lines before the query. -C will print the number of lines containing the query.

For instance, if my query is grep -A 10 "HOX" > hoxfile.fa hox.txt it will find instances of HOX and print 10 lines after that match. So if my HOX gene sequence is longer than 10 lines, it will truncate the sequence.

If you are searching for a sequence in a very long scaffold this can be useful

grep -A 1000 "TATACTGCGATGCTCGCTCGAGAGTATGAGAG" seq.fa > mysequence.txt

These will produce an output to the screen if the sequence exists (or you could redirect to another file using '> output.txt'). Otherwise there will be no output. But the search will only be for exact matches, and will miss sequences with line breaks in between.

Bioinformatics

Thursday, December 27, 2012

Terminal Shortcuts

No comments:

Post a Comment