Thursday, March 27, 2014

Finding Genes

How do you go about finding a gene? There are many resources online that are free to the public. 

The first step is to get the actual gene sequence. Ensembl is one of the best sites to look. Choose the species and then type in the gene of interest. In this case, I am going to compare the DSCAM gene in rat and in chicken.

What you'll see next is a summary of the gene with a description, where it's located and near the bottom is a cartoon visualization of the gene structure. The red boxes indicate an exon, where as the kinked "V" shape line between them are introns. You can get a general idea of how similar a gene is by visually comparing them, in this case between rat and chicken. Do they have the same number of exons and introns? Do they differ in size? 

To compare how similar gene sequences are between these two species, the sequences (either DNA or protein) should be downloaded. This will allow you to align them side by side to see what percent similarity they have.

DSCAM gene sequence of Rat

DSCAM sequence of chicken

In order to get the actual sequence, or the text, click on the export data button. 

Next, you will be given the choice of getting the DNA sequence (coding) or the protein sequence (peptide). Recall that introns are spliced out during mRNA processing, and that the final mRNA product has exons that are transcribed. This is known as the coding sequence

Below is the protein sequence. 

Proteins (which are made from units of amino acids assembled together) also come from this coding sequence, in particular groups of 3 nucleotides called the triplet codon. Since there is redundancy, meaning that one amino acid can have several ways of being made, it is often easier to compare a gene by its protein sequence if there is a lot of evolutionary distance between 2 organisms (ie. worms and turtles). If you try to compare the same gene in two distantly related species using nucleotide sequences, there may be too many changes between them to easily compare them. Changes to single nucleotides don't matter as much when you compare proteins, because even a single base change may end up producing the exact same amino acid, so it is more or less irrelevant to the functional consequence, or how the protein will act in the body of the animal.

SOC-DBR. (2014)

So, after you copy and paste the two sequences you want to compare (in this case DSCAM in chicken and in rat), you can align them using a free public tool called ClustalW. However, there is a large variety of alignment tools, some which have visualizations, and predictions about what the components of the gene are. I personally used Geneious a lot, but its not free (however you can download and use during the free trial period).  

 ClustalW will show you a side by side comparison of the protein sequence. L stands for leucine. S stands for serine. V stands for valine and so on. (You can google amino acid abbreviations to find a chart with all of them).  

You can visually see where there are differences in the amino acids and where there are similarities. The sequence will wrap around to the next line, with the numbers to the right indicating what amino acid you are at when the row ends. In this case, there are 450 total amino acids (or 1350 nucleotide bases) total. The first row is the rat sequence and the second row is the chicken sequence.

An important thing to look for are changes in amino acids that are different in charge. An amino acid can have polar, non-polar, negative, positive and neutral properties. If a mutation causes an polar amino acid to change to a non-polar amino acid, this may affect the final protein structure by changing its overall biochemical properties. This type of change is called a nonsynoymous change. In evolution, these types of changes have resulted in countless gain of function mutations, where all the sudden a new protein acts in the organism and may help drive evolution (if it offers some evolutionary advantage). One fascinating example of this is the FOXP2 gene in humans which helped human language to develop. 

As an exercise, you can try to look for some of these nonsynonymous changes above. For instance, a lysine (+) to a glutamate (-) change might alter how the final protein acts in the body.

UC-Davis Chemwiki. (2014).

You can also access statistics on how many amino acids match. In this case, there is 95% similarity between these two genes. This appears to be a highly conserved gene between rat and chicken. However, that 5% of difference may result in noticeable differences in the final protein structure. For instance, if there are many more negatively charged amino acids added, it can affect the way the overall protein folds or is shaped.

In the figure above, rat is the first line and chicken is the second line. The | lines inbetween denote where there is a perfect match between them. With 95% of the amino acids identical, the inference is that this gene is important, and so a selective pressure exists to keep it in the gene pool. 

For instance, if a mutation occurs that alters the protein, it may have huge consequences that affect fitness, ie if an truncated protein is produced that happens to create a disease (like in Cystic fibrosis), or perhaps this gene is necessary for proper development and embryos that don't have a normal copy die before they're born. The fact that a gene is highly conserved suggests that it has an important function.
There is another tool I like to use when comparing genes between species: The USCS Genome Browser. If you do a Blat search for one sequence, you can see if it shares homology with genes in other species.

This gives a visual of the alignment and allows you to see what the gene structure looks like in other species. In this case, the gene is highly similar in human, rat and chicken.