Course notes for the October 18th 2009 lecture of the Genome Evolution Course

 

Itai Yanai

Department of Biology

Technion – Israel Institute of Technology

yanai@technion.ac.il

 

Introduction to Evolution and the Human Genome

 

‘The organism is both the weaver and the pattern it weaves, the choreographer and the dance that is danced.’ Steven Rose, 1997 

 

Welcome to the age of personal genomics

 

In less than 10 years you will know your own genome. You can already get it today if you are willing to pay. The first genome had a price of 500 million dollars. In 2009 it’s about $50,000 (10,000 times cheaper!) and the price is quickly coming down. The USA government has already passed a genetic non-discrimination act and insurance companies are starting to consider the effect of knowing each individual’s genetic make-up will have for the way in which their business operates. Some individuals have already had their genomes completely sequenced; most famously Craig Venter and James Watson. George Church has started the Personal Genome Project which has already sequenced 10 individuals and aims to sequence 100,000 more, making each freely available on the web. So with the genome age upon us we consider its implications.

 

Consider the effect on medicine. When we each go to the doctor’s office today (as for thousands of years), our treatment is a wise prescription based upon the sum experiences of previous patients, a kind of cure that has generally worked on most people. However, no single individual is “an average” but rather each may have important particular differences with implications for the treatment. Ideally, our medicine would be unique to each of us. With our genome known, we have the potential to identify our genetic strengths and weaknesses, our susceptibilities and immunities. Of course, as with any radical change, there are clear dangers associated with everyone genome being known. However, regardless of how we feel about it, we are entering into a brave new world where the genome is indeed known. If nothing else, imagine the satisfaction of knowing with perfect precision your own genetic heritage – the book of books passed down to us from times that have left no other trace to us.

 

But in seizing on this possibility, we need to figure out what the genome means. This course is exactly about the problem of making sense of the genome. The key idea is to invoke the theory of evolution: to consider the genome from the angle of how it came to be in its present state.

 

Where is our heredity stored?

 

“..Everything in a living thing is centered upon reproduction. A bacterium, an amoeba, a fern - what destiny can they dream of other than forming two bacteria, two amoeba, two ferns?”

Francois Jacob “The Possible and the Actual” (1989)

 

The genome is not an abstraction. It is a real physical object and its location in your body is everywhere. Every cell - with the minor exception of red blood cells, sperm, and eggs - contains a nucleus inside of which reside the chromosomes. Chromosomes, literally meaning "colored bodies", are visible under the microscope with a staining technique that differentiates them from their context. The chromosomes are essentially the physical encapsulation of the genome. Their properties which we will now review, even at the large scale, give away their role in inheritance.

 

Every cell comes from other cells by a process of either cell division or cell fusion and in both instances the chromosomes have a starring role. Before a typical division, the chromosomes can be seen to duplicate and are then carefully teased apart so that each daughter cell receives a complete set. And with regard to cell fusion, each of us derives from a single cell that was a fusion of our father's sperm with our mother's egg. To insure that we do not begin our life with twice as many chromosomes - which would have occurred if any normal cells of our parents were fused – the organism goes to great lengths to halves the number of chromosomes of both the sperm and egg in a process called meiosis.

 

The genetic material - in the form of the chromosomes - is to a very large part the only contribution of the parents to the fertilized egg at the origin of each human life. A sperm cell is not much more than a capsule housing the shipment of chromosomes, some sugar to drive the motor and a small number of energy generators called mitochondria. The egg cell has many more mitochondria, in addition to plenty of proteins and nutrients and of course also the repertoire of chromosomes. From the point of sperm/egg fusion, the organism applies the instructions encoded in its chromosomes to differentiate and construct the approximately 10,000,000,000,000 (10 trillion) cells in the particular combination that specifies a human. So specific are these instructions that in the rare occurrence that the same fertilized egg splits into two, with identical copies of the chromosomes, each produces two identical individuals, identical in every biological way.

 

The human genome is organized into 46 chromosomes. Of these 44 are said to be autosomal because they pair up, yielding 22 pairs. These vary considerably in size and are numbered from the largest (1) to the smallest (22). The nature of the remaining two chromosomes indicate our gender. One of the two is always an X chromosome from our mothers. The other one, from our fathers, is either also an X which would make a female or a Y chromosome for males. If an extra copy of any of the autosomal chromosomes happens to occur in a fertilized egg the result is not viable, with one exception. An extra 21st chromosome produces a viable, though sterile, offspring with Down's syndrome.

 

Our heredity, our genome, is a long permutation of four basic units

 

"...the most essential part of a living cell - the chromosome fiber may suitably be called an aperiodic crystal." - Erwin Schrodinger, What is Life (1944)

 

A chromosome at its core is a molecule just like other well-known molecules such as H2O and CO2 and is called deoxyribonucleic acid, or DNA for short. I say "at its core" because the DNA is scaffolded with other kinds of molecules but since these do not resolve the real powers of the chromosomes in which we are interested, we will put them aside for the moment. A DNA molecule is quite large when compared with the three atoms of a CO2 molecule. Our Chromosome 1, for example, has about 1010 atoms. Despite its size, DNA turned out to have a fairly simple structure and nothing caused more of an earthquake in the last century in the life sciences than the its determination.

 

Most importantly, DNA is an informational molecule. As a bead-necklace is composed of an ordering of beads, DNA is an ordering four different types of beads, called nucleotides. The different nucleotides are called adenosine, guanine, cytosine, and thymine; or more simply A, G, C and T. The nucleotides attach only at their two ends and together form a simple line, or sequence of building blocks. To build a DNA molecule we line up a set of nucleotides, say A, T, G, G, G, and C, and link them together. This sequence ATGGGC is now encoded.

 

A truly amazing aspect of DNA molecules is that they typically come in pairs; where the paired molecules encode the inverse, or opposite, information. Two DNA molecules, or strands as they are called, line up and zip together. According to the chemistry of the nucleotides, there are simple rules for their associations: A's and T's always match up as well as G's and C's. The sequence of one strand exactly determines the sequence of the opposite strand with which it is complementary. The reason this aspect is so amazing is because the two strands can separate and each one still contains all of the information encoded in the sequence. As Watson and Crick famously wrote of their discovery "It has not escaped our attention that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material."

 

We have so far referred to the building parts of DNA as nucleotides. This is correct but since the different nucleotides only really differ at the part commonly called a "base", we will refer to them as bases from now on as is typical in the literature. Also, since each two DNA molecules pair up, each base in the sequence is actually a pair. Thus we say that the sequence "ATGGGC" corresponds to six base-pairs.

 

At this point we arrive at the profound conclusion that each chromosome is actually a sequence of base-pairs. The big and small chromosomes relate to longer and shorter sequences. They are volumes in the book of life we inherit and pass on. The sequence carried in the Y chromosome contains something about becoming a male. The sequence of the 21st chromosome contains something that when that information is in excess, in the form of an extra chromosome, a syndrome is caused. All together we call these sequences, the human genome. Although it is a set of sequences because they come from separate chromosomes we often call the genome one sequence as if we link the chromosomes together, the 1st through the 22nd chromosomes as well as the X and Y chromosomes, and come up with one sequence.

 

Introducing the human genome

 

Our goal is to understand this sequence. To begin with we might want to know how long this book is going to be; how many letters? We find that there are about 3,000,000,000 base-pairs in the haplotype human genome. By haplotype we mean the information we receive from each parent; in actuality then we each have twice that number of base-pairs. As would be intuitively expected, the bigger the chromosome the longer the sequence embedded in it. As a matter of fact, such a correlation is found. Looking at the lengths of the chromosomes, for example, we can see that the biggest chromosome, 1, is essentially six or seven times longer than the chromosomes 21 and 22.

 

Besides length though, what other kinds of questions can you ask about a book written in a language which you know nothing of yet? Although we do not know the language we do know that its alphabet is composed of A's C's G's and T's. Is each letter used 25% of the time? It turns out that A's and T's are more popular than C's and G's. The overall content of G's and C's is about 40%; significantly less than the 50% we would have expected if all were equally probable.

 

Although the G's and C's (henceforth GC's) are less frequent than T's and A's, even given their background probabilities, we do not find that they are equally distributed in the genome. The distribution of the percent GC's of moving windows across the genome does not follow a normal distribution but rather an extreme value one. In other words, we find many more patches of GC-rich regions than expected if G's and C's were sprinkled in the genome following their background probabilities of .2 each. Putting aside the distribution now, we can actually look at the genome and see where these GC rich areas full. Quite strikingly, when the moving windows are categorized into five bins according to GC%, we find that long stretches of DNA, spanning several megabases, submit to same category of GC content. The genome appears to be a mosaic of such patches.

 

Chromosomes 21 and 22 differ markedly according to GC content. The former can be said to be composed of three regions, one GC poor, another where the GC's are present more than their background level, and a third which is GC rich. Chromosome 22, meanwhile, is almost completely GC rich. Does this difference underlie some other basic difference between the two chromosomes?  Yes. They differ in the number of 'genes' they encode. What exactly are these genes?

 

Every book needs a reader without which it is useless. While we hope to one day understand the text inside of us, there is already a reader to the book of life that does it naturally: the cell. Molecules called proteins, which are also macromolecules but with 20 building blocks instead of DNA's 4, protect the DNA, separate its strands, copy specific short sequences from it, and later translate these copies into proteins. These in turn, again, protect and read the instructions encoded in the DNA and complete the web of interactions between them. They also replicate the DNA when it is time to split into two cells. These proteins carry out all of the processes we tend to associate with life.

 

Proteins are rightfully described as molecular machines. Some proteins, known as enzymes, can catalyze chemical reactions in microseconds that would have taken ages without them. Others, called crystalline, can detect light and make vision possible for us. Yet another kind is called chaperones, wander around the cell and help other proteins to fold correctly to assume the shape necessary for their particular function. There are thousands of different kinds of proteins in a human cell and the instructions for building them all are written in the human genome.

 

The problem can be stated mathematically. How does a DNA sequence with a four letter alphabet code (bases) for a protein sequence with twenty letters, called amino acids? It is clear that each DNA letter cannot code for each amino acid. Such a one to one relationship cannot exist because only four amino acids could be coded. If each pair of bases were to code for one amino acid only 16 (4*4) could be coded, which is still not enough. When using three bases, 64 possibilities (4*4*4) are made possible, which has the potential to encode all of the amino acids be used. We will now find out how this occurs and what happens to the extra combinations not needed.

 

As we derived theoretically, after the proteins copy a DNA sequence, a process called transcription, they proceed to translate it to a protein sequence by using a code. In this genetic code, each triplet corresponds to a specific amino acid. Interestingly, nature solved the problem that there are twenty amino acids to be encoded and 64 combinations, with a degenerate code. This means that an amino acid can be encoded by multiple triplets, or codons. Also three of the 64 codons are syntactic in that they encode the message "STOP" signifying the end of translation. The genetic code is not actually written anywhere in the cell but is enacted by a set of proteins each of which know which codons specify which amino acid. The copy can then be read by translating it into a protein sequence, which will go on to fold and be functional in the cell.

 

One would imagine that since the main function of DNA is to code for proteins, such information would take a strong majority of the sequence. Many lines of evidence bring us to the conclusion that this is not the case. There are around 25,000 genes in our genome and surprisingly they take up only about 1.5% of the sequence! The relatively small number of human genes however may not accurately capture the complexity of the organism. Each gene can actually code for multiple proteins. By a process called alternative splicing, proteins modify the copies of the DNA prior to translation, such that different combinations of the copy can be translated into different proteins. It is estimated that over half of all genes yield more than one type of protein.

 

Genes are not evenly distributed across the genome sequence. As we noted above chromosomes 21 and 22 differ in their CG content. Genes tend to favor sequences of high GC content. Regions with such as these can be considered gene cities while fewer genes live in the desert regions poor of GC's.

 

If only 1.5% percent of the genome is used to code for proteins then what does the rest do? A large portion of the genome can be accounted for by various types of repeats. 3% of the genome falls into the categories of simple repeats if each one is at least 12 base pairs.(Subramanian et al. 2002) One very predominant repeat is the seven base "TTAGGG" which repeats itself thousands of times at the ends of the chromosomes, which are called the telomeres.

 

But the repeats are not only as short as 12 base pairs. A certain sequence of 171 base-pairs called the alpha repeat is extremely popular in the portions of the chromosomes known as the centromeres. A centromere is that position of the chromsome, typically at the center, at which the chromosomes pair up when they prepare for cell division. Altogether, the alpha repeat corresponds to around 5% of the human genome (Henikoff et al. 2001)!

 

Very strange aspects of the genome

 

A giant fraction of the genome - about 40% of the DNA – corresponds to sequences belonging to a small number of sequence families. For example, roughly 20% of the genome can be grouped together as the family of L1 sequences. Scattered throughout the genome sequence are 850,000 "copies" of the same sequence of 6 to 8 thousand base pairs. With the term ‘copies’ we refer not to exact copies as when we looked at exact repeats but one that are very similar. What possibly explains these copies? Why are some copies more similar to each other than the others?

 

Besides these families, many chromosomal regions (altogether corresponding to another 2% of the genome) can be found on other chromosomes. Such chromosomal segments can be defined as containing at least three genes. Viewing the connections between the chromosomes in terms of such duplications we find a mosaic structure of copies. What accounts for this mosaic structure?

 

The human genome as I've described it above with its one through twenty-two chromosomes and the two sex chromosomes X and Y has another often forgotten member. Human cells all house these enclosures known as mitochondria which act as a power generator. What is surprising is that these powerhouses have their own genome which is tiny with respect to the one discussed until now, and has nearly 17,000 base pairs housing 37 genes. Why does the mitochondria need to have its own genome? There are other enclosures, or organelles, in the cell without genomes. What is special about the mitochondria?

 

What do all organisms have in common?

 

Clues to solve these and other puzzling aspects of the genome come from looking outside the box. While we are human beings and are most interested in our own place in this world, it becomes evident that other matter on this planet may be linked to us. In one of the most amazing instances of universality that I know of, every known biological organism has a carbon-based chemistry. This means that of the hundred or so atoms around on this planet only one dominates in all biology. Amazing. More than that though, all biology - animals, plants, bacteria, archaea, viruses and all - share the same operating system for transferring information from generation to generation and for carrying out the normal moment to moment existence. By this, I mean that all of the known 2 million species on earth all store their genetic material with nucleic acids which code for proteins and it is an easy prediction to make that all future discoveries of species will also be organized. Yet, furthermore, all organisms not only record their hereditary information in nucleic acids, they also use the same genetic code. And not less important, similar gene sequences can be found across organisms with a gradient of similarities between them. Putting these four facts together, we are led to find the great unity or universality in life which begs for an explination.

 

Evolution: The unifying theory of biology

 

How are we to interpret the mutual characteristics of all of living organisms? Long before the molecular properties common to all life forms were known, humanity was grappling with the timeless question of “what life is?”. The famous Greek philosopher scientist Aristotle is credited with organizing the world into a "great chain of being". Beginning with minerals which occupy the lowest tier of being and proceeding on up the chain, or ladder, up to plants, animals and finally man, each group has its own location in the master plan where the gradient shows the increase of being of the organism. Each group of organisms - or species - has a predetermined relationship to the others in the hierarchy and is thus invariant.

 

More importantly, this great chain of being was universally seen as the work of God, whose beautiful and intricate organismal designs was proof of his almighty. The study of living organisms was seen as an appreciation of His work. The British philosopher/priest William Paley gave a long-lasting metaphor for this inference. If you came upon a watch on a tree stump in the middle of the forest, you would not think to conjecture that the parts that make up the watch randomly came together. Rather, you would suppose that the watch has its origins as the product of a careful design of a watchmaker. Likewise, living organisms are intricately built and must also have a designer.

 

The place of man at the pinnacle of the designed hierarchy was seen as self-evident and was not seriously questioned until the late 18th centuries. The crack in the neat picture of two millennia of such a dogma was made by the Compte de Buffon, who openly questioned the biblical story that the earth is roughly 6,000 years old and talked about the possible common ancestry between man and apes. One of the first postulations of the evolution of life was made by, Erasmus Darwin, a leading thinker of his day and grandfather to Charles Darwin. Erasmus Darwin struggled with the notion of one species evolving into another but could not offer a mechanism for such a change.

 

A mechanism for evolution was offered by Jean Baptiste Lamarck in his “Philosophie Zoologique” published in 1809. Lamarck argued that species are not unchangeable but are in a constant state of improvement according to 1) an “internal force” and 2) the inheritance of acquired characteristics. By an “internal force” Lamarck imagined that the organism could somehow monitor its environment and produce offspring that were better adapted to it. The second mechanism is the one for which Lamarck is best known for today. By using a particular ability an organism may alter it throughout its life and then pass on these alterations to its offspring. Lamarck did not come up with the mechanism of the inheritance of acquired characteristics which was already a very popular notion at his time. However, he was the first to suggest that this mechanism could be responsible for the evolution of one species into another.

 

According to Lamarck's theory of evolution, species are constantly changing. However any two species, say dolphins and fruit flies, do not have common ancestry. Each species has its own distinct origin in time from which point it begins to evolve to more advanced species. Man, of course, was the first species, and since it has had the most time to evolve it is also the most complex. Bacteria on the other hand are still simple because they are relative newcomers. To Lamarck’s great credit, he recognized a branching hierarchy: evolutions proceed by common paths and thus there is a tree of relationships between the life forms. In other words, a fish can either evolve in the direction of mammals or towards birds. However any two species are not related in that they have common ancestors.

 

The mechanism of acquired characteristics was dismissed by August Weismann who showed that the germ plasma is continuous and receives no communication from the body (soma). As a human develops from a single egg, different types of tissues are differentiated. Of these, the germ cells which go on to produce the sperm and egg are quite distinct from the rest of the body and do not communicate with it. A loss of an arm by an individual is not somehow transmitted to the germ plasma and thus the next generation do inherit the loss. We can express Weismann’s insight with reference to the moderm operating system of organisms. As we saw, DNA codes for the sequence of a protein which goes on to effect the phenotype. Since there is no reverse path from protein to DNA, changes in the phenotype cannot be coded back to the DNA.

 

In 1831, at the age of twenty two, Charles Darwin, a medical school dropout with interest in zoology and geology, boarded the HMS Beagle as the gentleman companion of the captain, for a five year trip around the world to provide information for better maps. Darwin embarked on the journey with the notion of the fixity of species but the birds he collected from the different islands of the Galapagos archipelago questioned this belief. Reflecting on the birds several months after collecting them, he reasoned that although the finches had differences such as the size of their beaks and because these differences corresponded to different islands and each one fulfilled the same role in their respective islands, these birds must be varieties of the same species and not separate species. To Darwin's great shock, after his return to England, the celebrated ornithologist John Gould examined the birds and declared them members of different species.

 

Darwin could not accept the independence of the finches. To account for their similarities he drew a tree of relationships between them. He realized that as the islands separated by geological changes, the birds would evolve separately and become different species. This was an earth-shaking notion. If you extrapolated beyond the birds to life in general, it meant that all species are linked by a great tree of life! Multiple species derive, by slow changes, from an ancestral species. If the Galapagos birds indeed derived from those of South America then species are not distinct but are changeable. The birds of the islands were similar though and so are varieties of species, attesting to the fact that evolution happens in small steps as opposed to drastic changes. As varieties are known for many species such as flowers, trees, dogs, etc., it is conceivable that each could split off into daughter species. If we go back in time then, the species collapse onto their parental species until the history of all of life is reconstructed.

 

Darwin believed he had witnessed the change in species but he could not say why. What is the mechanism by which species change? Oddly, his answer came while reading the work of a political economist. Two years after his return, Darwin happened to be reading Malthus's "An Essay on the Principle of Population" where Malthus notes that population has the potential to increase geometrically, that is, 2:4:8:16:32:64:128:256:512:1024, etc. He warned that since the earth is not capable of sustaining such growth, humanity had better control its population to avoid falling prey to plagues and other natural population controllers. As Darwin wrote in his diary: "it at once struck me that under these circumstances favourable variations would tend to be preserved, and unfavourable ones to be destroyed. The results of this would be the formation of a new species."  

 

The notion of natural selection can stated pictured in terms of a simple thought experiment with a population of individuals. As the population grows in time a competition will develop among the individuals for the limited environmental resources. Since populations have the intrinsic potential to grow exponentially, a limitation of resources means that not all of the individuals will be able to survive long enough to produce offspring. The important point is that the matter of who survives is not decided arbitrarily. Rather, because in any population there are those individuals that are (even if just a little) more adapted to capturing the resources and producing offspring, it will be these individuals that tend to survive better. When they survive they produce offspring with a similar genetic makeup and thus pass forward their adaptive way of life. Soon, those individuals less adapt will be extinct. Thus even though the original differences that caused some individuals to be more fit were random, the mechanism that selected for them is not but policed by a simple matter of life and death.

 

But Darwin did not publish his theory right away. Instead he waited for twenty years and might have waited more if it had not have come to his attention that a younger colleague, Alfred Wallace, had also read Malthus' essay and had independently come up with the same mechanism for evolution. Before Wallace came into the picture, Darwin planned to write an encyclopedic treaty on his thesis, but with the pressure of time he decided to draft his theory into what he called "an abstract". This abstract is the 1859 landmark book, "On the Origin of Species". In this “book that shook the world”, Darwin began, oddly, with the subject of the domestication of animals and plants. As every finch enthusiast knew, characteristics could be selected for by selective breeding. The offspring tend to be very similar to the parents (like produce like) so a finch with a long beak tend to have offspring finches with long beaks. Well-known to breeders though is that the offspring are not identical to the parents, there is always room for variation. So in our example of the finch with the long beak, some of its offspring may have the same length beak, some shorter beak, and some even longer beak. If an enthusiast was interested in a finch with a long beak, he could attempt to do so by only allowing the finch offspring with a longer beak to mate and not those with decreased beaks.

 

Every Englishman could recognize that Darwin's point on selective breeding was correct. Darwin then proceeded to show that nature could also select for characteristics in an analogous fashion. Since in every species, many more offspring are made than can possibly survive, a struggle for existence takes place between the members of each population. One may think, that nature would at least be fair and grant each individual an equal shot at generating offspring for the next generation, but alas, it is not. As was said earlier, the members of a population are not identical but vary. If a long beak is instrumental for an individual towards obtaining the limited amount of food in the area, then a finch with a long beak is more likely to survive and leave offspring for the next generation. On the other hand, a finch with a short beak might not be able to find food and hence starve to death before reproducing. Overtime, the population would have a longer beak on average. Darwin called this mechanism, Natural Selection and the philosopher Herbert Spencer dubbed the process “survival of the fittest”.

 

Origin of Species quickly became a bestseller but needless to say it did not lack its detractors. Besides the theological problems, four problems were serious objections. First off, Lord Kelvin claimed that there was not enough time for evolution to take place. Based upon the heat of the crust of the earth, he calculated that the Earth could not be more than 40 million years old. The discovery of radioactivity however showed that this estimate is incorrect. The Earth is now believed to be over 4 billion years old allowing plenty of time for life to originate and evolve.

 

Fleeming Jenkin poined out that natural selection would not be able to work according to the blending of characteristics. At Darwin’s time, it was believed that heredity was a blend of the parent's characteristic. If this was so then a new useful characteristic would blend into the old one and the characteristic would not be propagated. For example, a finch with a beak twice as long as any of the others in the population would have to mate with a finch of normal beak and their offspring would have a beak that is only one and half times longer. When it itself would mate the beak of its offspring would now be only one and a quarter long and so forth until the beak was not noticeably longer at all.

 

Thus blending argument represented a serious attack for Darwin, who no doubt lost many hours of sleep pondering it. Unknown to Darwin and Jenkin however, this problem had been solved by an Austrian monk Gregory Mendel. Working with peas, Mendel had shown that characteristics such as color and size were not blended but were inherited distinctly. Mendel had taken tall pea plants and crossed them dwarf pea plants (where each one came from a pure line) and received an entire generation of tall plants. The tall trait was not blended with the dwarf. Instead, the tallness was said to be dominant over dwarfness. When this generation was then crossed with itself, the dwarfs reappeared with a ratio of one to three against the tall kind. In essence, Mendel had shown that characteristics could be acquired as distinct units. A new characteristic would not be blended with the old, then, it could survive unblended and be amplified by natural selection.

 

A common reservation in accepting the theory of evolution has to do with its apparently strong dependence upon chance. This view was voiced by the biologist Hershell who referred to the theory of natural selection as "the law higgledy piggledy". By this, he referred to the tremendous amount of presumed randomness that is ascribed to the evolution of species. Richard Dawkins, in his book “The Blind Watchmaker” (a reference to Paley’s watch metaphor), portrayed this argument by considering the probability of a monkey randomly typing up the sentence “Methinks it is like a weasel.” Dawkin’s showed that this is absolutely not how natural selection works. Imagine that we allow the monkey not one but a thousand attempts at typing the sentence. As we already noted, the chances are astronomically small that any one will be completely correct. However, it is probable that in one of the thousand attempts, the monkey will get at least one letter in the correct location in the sentence. Natural selection would select this particular one attempt and based upon it ask the monkey to generate one thousand more copies, holding the location which was already correct fixed. Again, natural selection would pick out of the new batch, that sentence which is closest to the target sentence. In such a stepwise fashion, natural selection takes small steps towards achieving highly improbable targets if only one shot was given. The important point is that natural selection does carry out its changes all at once but rather with very slight increments each of which is probable.

 

Evolutionary Genomics

 

Up to this point we have developed two major concepts: 1) Genomic, that our genetic material is essentially a long sequence written in a four letter alphabet, and 2) Evolutionary, the common descent of living organisms. We are now ready to combine the two and embrace the notion of the genome as a sequence that changes over time. In other words, comparing our genome with our parent’s genome, with their parent’s genome, and so forth, we will find more and more differences among the sequences the deeper we look back in time. The story of changes that occurred since the first primordial genome to our current genome is a history of ourselves.

 

That the sequence evolves is a rather theoretical view. It could be posed that since we cannot go back in time to see the changes, we can neither show their existence nor decipher them. What shouldn’t be forgotten however, is that humans are not alone on this planet. That all life descended from that same first primordial genome opens new venues for unraveling the past. A pair of species whose divergence occurred a million years ago will have genomes that are separated by two million years of separate history. The genomes will have accumulated changes which are proportionate to the amount of time and the mode of life the organisms. Identifying the similarities between the two we would be tracing the genome of their last common ancestor. In summary, knowledge of the genomes of living organisms can be used to trace the genomes of the past.

 

The technical feat of sequencing a genome completely is a recent one. Until 1995 only the genomes of viruses were known completely, but now over a hundred microbial genomes are available. A major milestone was achieved in 2001 with the completion of a draft of our genome. The mouse, with which we have a common ancestor that lived roughly 100 million years ago, was announced last year (2002). Also available are a fish, a fly, a worm, and the plants oat, barley, rice, wheat, and corn. Moreover, a hundred or so microbial genomes have been published. These genome sequences along with the hundreds more in the pipeline are our starting point for understanding out the genome by comparisons with those of others.

 

The situation is not unlike attempting to understand a game where you know who the winner is but not how they won or what exactly the rules were. In trying to understand our genome we know what it looks like now (the winner genome) but not the steps that have led to its current state, nor the rules of the game (how the genome changes). Our objectives, which can be said to correspond to the nascent field of evolutionary genomics, are twofold then. On the one hand we will learn the rules of the game, i.e. the mechanisms of evolution in the genome. How does the genome change? What are the acting forces? Can we indeed see the effects of natural selection in the genome? Are there other forces? And on the other hand, we will be interested in how these forces have acted to shape the history of the genome. Can we trace back our genome sequence from its current shape back to the origin of life?

 

Throughout we will find, as Dobzhansky put it, “nothing in biology makes sense except in the light of evolution” and that this “is not some kind of evolutionist propaganda, but an entirely literal and more or less routine description of the situation” (Koonin and Galperin 2003). For example, it is certain that the mitochondria derives from a once free living species that was in a sense enslaved by a eukaryotic cell to produce energy for it. Several lines of evidence lead to this scenario. First off, not all eukaryotes have mitochondria hinting that at some point the mitochondria entered the symbiosis and the rest have been inherited from that point. Second, as we noted earlier, the mitochondria have their own genome further hinting that it was once free living. However, the most convincing fact is that the genome of the mitochondria is similar to a species of bacteria called Rickettsia prowazekii much more so than to the DNA of any other organism including the human genome which it is a part of (Andersson et al. 1998).

 

Tracking the human genome from its present sequence to the past is admittedly an anthropocentric route. But our primary interest is - like many things in biology - a selfish one, we want to know where we come from. Ironically perhaps, we will of course find that all roads lead to the same common universal ancestor and that we could have started with any organism. As one NYC taxi driver said "we all go different ways to see the same things".

 

 

References:

 

Andersson, S.G., Zomorodipour, A., Andersson, J.O., Sicheritz-Ponten, T., Alsmark, U.C., Podowski, R.M., Naslund, A.K., Eriksson, A.S., Winkler, H.H., and Kurland, C.G. 1998. The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396: 133-140.

Henikoff, S., Ahmad, K., and Malik, H.S. 2001. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293: 1098-1102.

Koonin, E.V. and Galperin, M.Y. 2003. Sequence - evolution - function : computational approaches in comparative genomics. Kluwer Academic, Boston.

Subramanian, S., Madgula, V.M., George, R., Mishra, R.K., Pandit, M.W., Kumar, C.S., and Singh, L. 2002. MRD: a microsatellite repeats database for prokaryotic and eukaryotic genomes. Genome Biol 3: PREPRINT0011.