Genome Evolution Course 2009-2010

www.yanaiweb.com/genome

Itai Yanai, Technion – Israel Institute of Technology

 

Tutorial Presentation as PDF or PP.

 

Problem Set #10 assigned December 28th, 2009

 

To be submitted as hard-copy in English or Hebrew on January 3rd, 2010 (at the beginning of class, 9:30am).

 

Problem 1: Genome size. In class, we discussed the C-value paradox in which genome size vary tremendously even among similar organisms. Consider for example the Leonpard shark and Ghost shark. Although phenotypically similar they differ in genome size by more than a factor of 2. Explore the Gregory Lab’s Genome Size database (http://www.genomesize.com/) to detect another such example. Remember, 1pg (picogram) is roughly 1 billion base pairs.

 Chondrichthyes

Carchariniformes

Triakidae

Triakis semifasciata

Leopard shark

4.80

72

BFA

RBC

SP

219

Chondrichthyes

Chimaeriformes

Callorhinchidae

Callorhinchus milii

Ghost shark

1.94

FIA

RBC

BS, GD, OM, RP

204

 

Problem 2: Respectfully Selfish DNA. Why do LINEs seem to be so respectful of the HoxD cluster on chromosome 2?

 

Problem 3: Detecting selfish (repeat) elements. Use the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway) to examine a 90,000 base-pairs region of DNA in the human genome. For example, chr1:1,000,000-1,090,000 and then enter submit (it would be very improbable for any of you to select the same region…) The last line is the RepeatMasker line which identifies repeat elements. To see all of the elements, change the RepeatMasker setting to “full” and the refresh the page. The genomic image should now display the locations of the SINE’s, LINE’s, LTR’s, and DNA transposons. Do the locations of these elements anti-correlate with the locations of the gene exons?

 

Problem 4 (2 points): What is the age distribution of the repeat elements? To get the DNA sequence from the previous question, click on DNA at the top of the page. Then click “get DNA”. Copy and paste this 1Mb sequence in the RepeatMasker website http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker  and then click submit sequence. The job should take less than a minute to complete. At the bottom of the results page, click on the Annotation File (NEW XHTML Format). Here you will see each repeat element in the sequence represented as a row in the table. You can see the annotation evidence for calling it a repeat by clicking on the ± and observe the sequence you inputed aligned to known ALU’s from the database. It may be most convenient to use Excel to analyze the data. For this you can simply copy the entire webpage onto a “Notepad”, saving it, and then opening it in Excel.

 

Make separate histograms of the % divergence for “SINE/Alu” “SINE/MIR” “LINE/L1” and “LINE/L2” repeats, where the x-axis is in units of 5% intervals, and the y-axis indicates the frequency of the element at that 5% interval. As we discussed in class the %divergence can be a proxy for time. Describe what this “repeat archaeology” reveals about the repeats in your 90kb region.