Evolution of Genes, Genomes, and the Genetic Code

bayes.colorado.edu: maintained by Rob Knight , Department of Chemistry and Biochemistry, University of Colorado, Boulder

Bioinformatics Supergroup information

From Organism to Base Composition

Just as a shell can be described with a few simple parameters representing its shape, a genome can be described with a few simple parameters representing its composition. The tetrahedron on the right-hand side shows genome compositions for several hundred organisms: remarkably, the total coding sequence of every organism clusters in a small range of the possible space, around Chargaff's Axis where C=G and A=T.

By investigating the trajectories of evolving genes and genomes in the space of possible compositions, and by tracking the changes that affect a pool of random RNA sequences when they are selected to perform a catalytic task, we hope to uncover fundamental rules relating composition to structure and function. This research will help us understand how mutation and selection affect genomes, and how life could have arisen from random RNA molecules.


What is genetic information? Shannon's Theorem tells us that information is a decrease in uncertainty, where 'uncertainty' is defined as the sum of weighted log probabilities of each outcome. For example, a coin toss has two possible outcomes, so the uncertainty about it beforehand is 0.5 × log(0.5) [heads] + 0.5 × log(0.5) [tails], or one bit (using base-two logarithms). After the coin toss, there is no uncertainty about the result, so we have gained one bit of information.

Genetic information, therefore, consists of inherited states that allow us to make better predictions about an organism's phenotype. In principle, there is no need for information to be localized inside a single molecule, or even for any physical continuity - the germ plasm could be created anew in each generation by gemmules from throughout the body, as Darwin originally suggested. However, we now know that a vast source of heritable information - ranging from a few kilobytes in viruses to over a gigabyte in the diploid human genome - is stored and transmitted as DNA sequences.

My work focuses on how this information is translated from DNA to protein, and how both the information itself and the translation mechanism by which it expressed evolve in different lineages. The genetic code, the mapping between trinucleotide 'codons' in genes and amino acids in proteins, is highly resistant to certain types of genetic errors - is this because the code has been selected from a vast population of inferior alternatives, or because rules of chemistry fixed codon assignments to their present states early in evolution? How and why does the genetic code vary in modern cells and organelles? Why are codons used nonrandomly, even in cases where several codons have the same meaning? How does mutation affect the information content of individual genes and genomes?

More recently, I have become interested in the distribution of functional RNA molecules in the space of all possible sequences. How many random sequences do you need to search to find a particular binding or catalytic function? I am currently using a range of analytical and computational techniques to address questions about the frequency of interesting RNA sequences and structures, and to find whether there there are general rules that unite functional RNA molecules. Some of the predictions from the mathematical work and from the supercomputer simulations are currently being tested experimentlly by others in the Yarus lab.

Selected Slide Presentations

The Evolution of Information MCD Biology, CU Boulder, January 2003

The Origin and Evolution of the Genetic Code Final Public Oral examination, EEB, Princeton University, April 2001

Codon Assignments as Molecular Fossils: Did ancient amino acid binding sites shape the genetic code?

Information Loss in Mitochondria: Are Mutation Biases to Blame?

The Scope of Selection

Ph.D. Thesis: The Origin and Evolution of the Genetic Code: Statistical and Experimental Investigations

  1. Complete thesis (347 page pdf: 5.7 MB)
  2. Single-spaced for printing (198 page pdf: 14.8 MB)
  3. Abstract (10-page Word Document: 1.3 MB)

This thesis won the CGS/UMI Distinguished Dissertation Award, Life Sciences division, for 2001. Acceptance speech, page 6 of pdf

Publications

  1. Knight, R. and Yarus, M. (2003). "Analyzing partially randomized nucleic acid pools: straight dope on doping." Nucleic Acids Research 31:e30. pdf
  2. Knight, R. and Yarus, M. (2003). "Finding specific RNA motifs: Function in a zeptomole world?" RNA 9:218-230. pdf
  3. Yarus, M, and Knight, R. D. (2003). "The Scope of Selection". To appear in "The Genetic Code and the Origin of Life", Landes Bioscience 2003. pdf
  4. Knight, R. D. and Yarus, M. (2003). "Tests of a Stereochemical Genetic Code ". To appear in "Translation Mechanisms", Landes Bioscience 2003. Word document
  5. Knight, R. D., Freeland, S. J., and L. F. Landweber (2001). "A Simple Model Based On Mutation and Selection Explains Compositional Trends Within and Across Genomes." GenomeBiology 2(4):research0010.1-0010.13. pdf
  6. Knight, R. D., Landweber, L. F., and M. Yarus (2001). "How mitochondria redefine the code." J. Mol. Evol. 53:299-313. pdf (or original Word and Excel files for non-mangled figures...)
  7. Lozupone, C. A., Knight, R. D., and L. F. Landweber (2001). "The Molecular Basis of Nuclear Genetic Code Changes in Ciliates." Current Biology 11:65-74. [Page Proofs (pdf)] [Supplement (Excel spreadsheet)]
  8. Knight, R. D., S. J. Freeland, and L. F. Landweber (2001). "Rewiring the Keyboard: Evolvability of the Genetic Code" Nat Rev Genet 2:49-58. pdf [supplement 1 (pdf)] [supplement 2 (Excel spreadsheet)]
  9. Knight, R. D. and L. F. Landweber (2000). "The Early Evolution of the Genetic Code" Cell 101(6): 569-572. pdf
  10. Knight, R. D. and L. F. Landweber (2000). "Guilt by association: the arginine case revisited." RNA 6(4): 499-510. pdf
  11. Freeland, S. J., R. D. Knight, L. F. Landweber and L.D. Hurst (2000). "Early Fixation of an Optimal Genetic Code." Mol Biol Evol 17(4): 511-518. pdf
  12. Freeland, S. J., R. D. Knight, and L. F. Landweber (2000). "Measuring adaptation within the genetic code [letter]." Trends Biochem Sci 25(2): 44-5. pdf
  13. Knight, R. D., S. J. Freeland, and L. F. Landweber (1999). "Selection, history and chemistry: the three faces of the genetic code." Trends Biochem Sci 24(6):241-7. pdf
  14. Freeland, S. J., R. D. Knight, and L. F. Landweber (1999). "Do proteins predate DNA?" Science 286(5440): 690-2. HTML
  15. Knight, R. D. and L. F. Landweber (1999). "Is the genetic code really a frozen accident? New evidence from in vitro selection." Ann N Y Acad Sci 870: 408-10. Word document
  16. Knight, R. D. and L. F. Landweber (1998). "Rhyme or reason: RNA-arginine interactions and the genetic code." Chem Biol 5(9): R215-20. pdf