Course notes for the December 20th 2009 lecture of the Genome Evolution Course

 

Itai Yanai

Department of Biology

Technion – Israel Institute of Technology

yanai@technion.ac.il

 

Notes drafted by Michal Levin

 

Evolution by Horizontal Transfer

A brief history of the tree of life

Before it was known that DNA is used as the principal code for all living structures, the tree of life was generated by ‘organismal biology’ where phenotypes such as body size and elaborations of morphologies were used (as discussed in the Phylogenetics lecture). In the 1960’s, following the ability to sequence DNA and proteins, these biomolecules were used for building the tree of life. A more general and robust picture of the relationships among species was revealed using the 16s ribosomal sequences. Since the 1990’s the resolution of the tree of life has been driven by the genomic revolution which delivers an increasing number of fully sequenced genomes. Paradoxically, however, instead of improving the resolution of the tree of life, the entire concept of a tree of life has been put in jeopardy.

When investigating gene trees it is easy to acknowledge that different genes can yield different phylogenetic trees and that even the sacred of sacreds of phylogenetic taxonomy, namely that of the two kingdoms Archea and Bacteria, can in some cases be violated.  How can this phenomenon be explained? The solution might be horizontal gene transfer (HGT). The HGT explanation possesses two ingredients which are sure to cause controversies. First, it challenges the traditional tree-based view of evolution as horizontal transfer is inconsistent with vertically transferred traits. Second, it is difficult to prove unambiguously, since we cannot go back in time and prove that a specific gene has been transferred horizontally from one organism to another.

Some scientific findings delivered proofs that HGT can physically happen. In 1944 Avery, McLeod, and McCarthy dramatically illustrated that organisms are able to absorb DNA from the environment and identified that DNA is the "transforming principle" in pneumococcus type III. The significance of horizontal transfer was first recognized in the 1950’s when resistance to multiple antibiotics was shown to be transferred simultaneously from Shigella to Escherichia coli.

What is horizontal transfer?

During horizontal transfer, genes from one organism are transferred to another by different mechanisms. Xenologs are homologs whose sequences are found in different species because of horizontal gene transfer. As opposed to orthologs which are homologs related by speciation and paralogs which are homologs related by duplication, xenologs are related by a horizontal transfer event.

Horizontal transfer can occur by transformation which “is the mechanism by which a cell is genetically altered by the uptake, genomic incorporation, and expression of foreign genetic material” [Wikipedia]. The transfer can result from the introduction of DNA by viruses (transduction) or by cell-cell contact between bacteria (conjugation). Introduction of foreign DNA into eukaryotic cells is usually called transfection. DNA can also be picked up from the environment for instance from a lysed cell. The DNA is picked up and incorporated into the genome where it has the chance to rise to fixation.

The interest of understanding horizontal transfers extends beyond academic debate and has some very important practical implications. More and more bacterial infections are resistant to many antiobiotics, these resistances are transferred from one pathogenic species to the other and pose some severe health related challenges. Further, bioengineered plants may also exchange genes with natural species and have serious ecological ramifications.

Bacterial and Archaeal genomes.

In 2010, over 1000 microbial genomes are available. Archaeal genomes have an average of around 2 megabases while bacteria have two modes of genome sizes, one at around 2 megabases and one at around 5 megabases. Both kingdoms have very gene dense genomes, with archaea being the more dense with one protein encoded every 1000 basepairs, indicating that, on average, archaeal genomes are more compact than bacterial ones. The lengths of an average gene in these genomes is slightly less than 1000 basepairs and the intergenic distance is around 100 basepairs unless it is less than 10 basepairs, probably due to operons.

We can distinguish among three kind of genes based upon their distribution across species: Core, present in all organisms; Shell, present in a large number of organisms; and Cloud, present in just a few organisms. Examining their frequencies we find that there are few core families, many families have representatives in many species, while most families are clouds, having representatives in only a few species. At the genome level, the situation is slightly different: most genes are shell genes.

Detecting horizontal transfers in genomic sequences

“All criteria for identifying probable horizontal gene transfer, or more precisely acquisition of foreign genes by a particular genome, inevitably rely on some unusual feature(s) of subsets of genes that distinguishes them from the bulk of genes in the genome.” Koonin et al. 2001

1.      Unusual features that can hint to a possible horizontal transfer event:

2.      Unexpected ranking of sequence similarity among homologs

3.      Unexpected phylogenetic tree topology

4.      Unusual phyletic pattern

5.      Conservation of gene order

6.      Anomalous DNA composition

Although any of the above features can imply horizontal transfer, direct proofs are unavailable and indications of horizontal transfers remain probabilistic.

Unexpected ranking of sequence similarity among homologs

This feature refers to basic assumptions of the horizontal transfer hypothesis. When aligning sequences from different species a gene sequence from a particular organism shows the strongest similarity to a homolog from a distant taxon. For instance, the eubacterial thermophilic organism Termotoga maritime has1858 protein coding genes. Of these, 451 (24%) have best-hits in Archaea! The majority of housekeeping functions are most similar to orthologs in eubacterial species. In contrast, 49% of transporters (92) and 42% of conserved hypothetical proteins (173) are most similar to archaeal genes. These may be the thermophylic genes.

Unexpected phylogenetic tree topology

If, in a well-supported tree, a bacterial protein groups with its archael homologs to the exclusion of homologs from other bacteria and, best of all, shows a reliable affinity with a particular archael lineage, the conclusion that horizontal gene transfer is at play seems inevitable. An example is the MN-dependent transcriptional regulator. Examining the topology of the gene phylogenetic tree it becomes obvious that archaeal and eubacterial genes are all mixed up and that the genes’ phylogentic distribution does not reflect the kingdom phylogentic distribution. The signature of horizontal transfer shows incongruence between the species tree and the gene tree.

Another example is the gene glutamine synthetase GSI.  Its phylogenetic tree indicates horizontal transfer between the three domains.  Species of archaea and bacteria are mixed and, therefore, are not monophyletic (have the same ancestor). According to the GSI tree the eukaryotes are monophyletic however, the Bacteria, and not the Archaea, are its closest outgroup. Several bacterial species have both a bacterial GSI isoform and a eukaryotic GSII isoform.

3. Unusual phyletic pattern

In a specific subtree of species some of the species may lack a specific gene while others might contain it. This scenario can be explained by several different hypotheses. Either the gene has occurred independently in some of the species while in others not. The probability for such a case is very low. Another option is that either the gene was lost from some of the species or the gene was transferred from another species. Because several models can explain the one scenario the decision is made based on the most parsimonious explanation, e.g. the shortest way to arrive to a certain topology. In the example of slide 28 and 29, horizontal transfer is the most parsimonious explanation because only one transfer event needs to be hypothesized while two gene loss events need to be assumed in the gene loss model.

When plotting the number of lineages or species in an ortholog family it becomes evident that most families are present in only a few species. Nevertheless an immense scattering of genes in microbial organisms can be observed suggesting a tremendous amount of gene trees inconsistent with a species tree unless gene loss is repeatedly invoked. When applying gene loss consistently to cases of patchy gene distribution assigns more and more genes to ancient common ancestral genomes. In the end, it requires a last universal common ancestor with an enormous range of metabolic capabilities - one huge ‘genome of Eden’. Therefore many of the patchy gene distributions must have evolved through horizontal gene transfer.

Conservation of gene order

The presence of three or more genes in the same order in distant genomes is extremely unlikely unless these genes form an operon. Each operon typically emerges only once during evolution and is maintained by selection following this event. Therefore, when an operon is present in only a few distantly related genomes, horizontal gene transfer seems to be the most likely scenario. Genes with conserved gene order tend to be functionally related. Why should genes of common function cluster along the chromosome? The Selfish Operon Model postulates that the organization of bacterial genes into operons is beneficial to the constituent genes in that proximity allows horizontal co-transfer of all genes required for a selectable phenotype. Horizontal transfer of selfish operons most probably promotes bacterial diversification.

A good example is the restriction enzyme system of bacteria. Restriction endonuclease enzymes are encoded adjacently to the methyl-transferase enzyme which protects the host DNA in all known cases. These two genes are known to undergo horizontal transfer; therefore their proximity is beneficial to their transfer to other genomes.

Anomalous DNA composition

Synonymous codons are expected to be neutral and to occur in equal frequency. Nevertheless, codon biases are found in all known prokaryotes, e.g. different organism have different codon frequencies. In some organisms codon frequencies are not similar for all genes. For instance, factor analysis of codon usage of B. subtilis genes reveals three classes of gene with different codon frequencies. Class 1 comprises the majority of the B. subtilis genes (82%). Class 2 (5%) genes that are highly expressed under exponential growth conditions. Class 3 (13%) genes that were apparently horizontally transferred. Because some of the genes in this group showed clear relationships with bacteriophage genes, the hypothesis has been proposed that all these genes were alien and have been acquired horizontally from various sources.

Why do horizontally transferred genes use the genetic code differently? Different bacterial species display a wide degree of variation in their overall G+C content. However, most genes have roughly the same GC content within a genome. When plotting the genes of B. subtilis against G-C content of the sequences class 3 genes show different codon frequencies than other genes. Apparently, horizontally transferred genes retain the sequence characteristics of the donor genome. A further indication for the hypothesis that horizontally transferred genes tend to keep the original codon frequencies is that genes not common to closely related genomes tend to have different base composition. For instance, a large number of S. enterica genes that are not present in E. coli (or any other enteric species) have base compositions that differ significantly from the overall 52% G+C content of the entire chromosome. In E. coli 18% of the genes are foreign Using base composition and codon usage to identify HT’s. 755 (17.6%) of the 4288 ORFs in the genome originated through at least 234 horizontal gene transfer events.

What happens to the peculiar codon biases of a horizontally transferred gene after some time in its new host? Amelioration is the process by which a sequence adjusts to the base composition and codon usage of the resident genome. Amelioration is a function of the relative rate of G,C to A,T mutations. Due to amelioration, very ancient horizontal transfer events would not be detected by using the method anomalous DNA composition. The investigation of the distribution of horizontally acquired (foreign) DNA in sequenced bacterial and archael genome reveals that horizontal gene transfer ranges from virtually none in some organisms with small genome sizes to ~17% in Synechocystis.

Classification of horizontal transfer events

Horizontal transfer events can be classified into three categories with respect to the relationships between the horizontally acquired gene and homologous genes (if any) preexisting in the recipient lineage:

New: A new gene with a new function is acquired which is missing in other members of a given clade. This can happen by one of two mechanisms: 1. Loss and regain - an unused gene is deleted and replaced by a homologue and then again selected for. 2. Non-orthologous gene replacement - a functional analogue (but not an ortholog) of an essential gene added, original subsequently lost.

Additional: A xenolog of a given already existing gene with a distinct evolutionary ancestry is acquired. Even though the two are homologs they will lie at different places on the tree of the gene family because they each have different evolutionary trajectories.

Replacement: A phylogenetically distant ortholog is acquired followed by xenologous gene displacement, that is, elimination of the ancestral gene.

Interkingdom horizontal transfers

Genes are not transferred only within the same kingdom (Archae, Eubacteria and Eukaryotes), but can also be transferred also from one kingdom to the other. Inter-kingdom horizontal transfers in most free-living bacteria involve ~ 3% of the genes and in archaea typically between 4% and 8%. Usually, only recent transfers can be detected, because ancient ones would have been obscured by interarchaeal hits.

Is there a tree at all?

Is there a stable core of genes that do faithfully record population bifurcations and speciation events, back to the group’s last common ancestor? ‘How many and which genes shared by all genomes in a given group, or indeed by all prokaryotes, have the same evolutionary history and produce the same phylogenetic tree?’

The archaeal genomes contain a striking “genomescape” strongly suggestive of massive horizontal gene transfer. Archaea proteins involved in translation, transcription, replication and protein secretion are most closely related to eukaryotic proteins. Whereas metabolic enzymes, metabolite uptake systems, enzymes for cell wall biosynthesis and many uncharacterized proteins appear to be 'bacterial'. These observations have been tentatively explained by massive gene exchange between archaea and bacteria.

The Complexity Hypothesis

Informational genes, particularly the translational and transcriptional apparatuses, are large, complex systems. In contrast, most operational genes (those involved in housekeeping)  are members of small assemblies of a few gene products. The distribution of peripheral branch-transfer distances of operational and informational genes from the EF-1 reference tree indicates that extensive horizontal transfer has occurred for operational genes, whereas informational genes are seldomly horizontally transferred. Jan et al. (1999) have shown that operational genes have been horizontally transferred continuously since the divergence of the prokaryotes. They suggested that a major factor in the more frequent horizontal transfer of operational genes is that informational genes are typically members of large, complex systems, whereas operational genes are not, thereby making horizontal transfer of informational gene products less probable. In fact transfer events affecting aminoacyl tRNA synthetase, ribosomal protein, elongation factor and even rRNA genes are known. However in general we would expect such genes and perhaps many others to define a core for many taxa at some moderate level of phylogenetic inclusiveness.

The core contains only 14 proteins which were used to produce combined alignments of 14 orthologous proteins conserved across 45 species from all kingdoms to construct a highly robust universal tree. The combined protein universal tree is highly congruent with the SSU rRNA tree, supporting the separate monophylicity of domains as well as the early evolution of thermophilic Bacteria.

Despite the massive influence of horizontal gene transfer, the concept of a universal ‘species’ tree may still appropriate. However, this tree should be reinterpreted as a prevailing trend in the evolution of genome-scale gene sets rather than as a complete picture of evolution.

The role of horizontal transfer in bacterial diversity

Escherichia coli strains O157:H7 is a human pathogen and K12 is a friendly gut bacterium are more different than any two mammals. Their two genomes differ in 26 or 12% of their genes (K12 and O157:H7, respectively; depending on which way you cast the comparison). Most of the genes unique to each strain undoubtedly derive from horizontal transfer event. In many cases, genes acquired by horizontal transfer confer species-specific traits. Novel acquisition (of a brand new function) can have dramatic effects. For instance, the founding of a clade of marine proteobacteria using light to pump protons thanks to acquisition of a halorhodopsin gene from an archaean (Beja et al. 2001). In another example the establishment of methanotrophy in bacteria is made possible by the co-optation of archaeal genes of methanogenesis (Graham et al. 2000). Further the conversion from anaerobic to aerobic lifestyle in haloarchaea,could occur after obtaining genes of respiration from bacteria (Kennedy et al. 2001).

In many cases, the amount and source of horizontal gene transfer can be linked to an organism’s lifestyle. Bacterial hyperthermophiles seem to have exchanged genes with archaea to a greater extent than other bacteria. Further, the transfer of certain classes of eukaryotic genes is most common in parasitic and symbiotic bacteria.

Sonea & Mathieu (2000) suggest that, somewhere in life’s first billion years, ‘lateral gene transfer mechanisms appeared and were progressively improved, furthering the development of diversity. ‘The prokaryotes’ constructive evolution resulted in the formation of a world-wide web of genetic information, and a global bacterial superbiosystem (superorganism)’. ‘By contrast, eukaryotic evolution of organisms has been typically Darwinian. Diversification of eukaryotic organisms was, however, considerably enriched and accelerated by symbioses with prokaryotes.’

In 2008 Koonin and Wolf describe a more dynamic view of the prokaryotic world, were genetic information and therefore biochemical functions are permanently exchanged between the different prokaryotic organisms. Therefore prokaryotic evolution is governed by two main processes, vertical and horizontal transfer of genomic information.