Genome Evolution Course 2009-2010
www.yanaiweb.com/genome
Itai Yanai, Technion – Israel
Institute of Technology
Tutorial
Presentation as PDF or PP.
Problem Set #10 assigned December 28th,
2009
To be submitted as hard-copy in English or Hebrew on January 3rd, 2010
(at the beginning of class, 9:30am).
Problem 1: Genome
size. In class, we discussed the C-value paradox in which genome size
vary tremendously even among similar organisms. Consider for example the Leonpard shark and Ghost shark. Although phenotypically similar they differ in genome size
by more than a factor of 2. Explore the Gregory Lab’s Genome Size database (http://www.genomesize.com/) to
detect another such example. Remember, 1pg
(picogram) is roughly 1 billion base pairs.
|
Chondrichthyes |
Carchariniformes |
Triakidae |
Leopard shark |
4.80 |
72 |
BFA |
RBC |
SP |
||
|
Chondrichthyes |
Chimaeriformes |
Callorhinchidae |
Ghost shark |
1.94 |
FIA |
RBC |
BS, GD, OM, RP |
Problem 2: Respectfully
Selfish DNA. Why do LINEs seem to be so respectful of the HoxD cluster on chromosome 2?
Problem 3: Detecting
selfish (repeat) elements. Use the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway) to
examine a 90,000 base-pairs region of DNA in the human genome. For example, chr1:1,000,000-1,090,000 and then enter submit (it would be very improbable for any
of you to select the same region…) The last line is
the RepeatMasker line which identifies repeat
elements. To see all of the elements, change the RepeatMasker
setting to “full” and the refresh the page. The genomic image should now
display the locations of the SINE’s, LINE’s, LTR’s, and DNA transposons.
Do the locations of these elements anti-correlate with the locations of the
gene exons?
Problem 4 (2 points): What
is the age distribution of the repeat elements? To get the DNA sequence from
the previous question, click on DNA at the top of the page. Then click “get DNA”.
Copy and paste this 1Mb sequence in the RepeatMasker
website http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker and then click submit sequence. The job should
take less than a minute to complete. At the bottom of the results page, click
on the Annotation File (NEW XHTML Format). Here you will see each repeat
element in the sequence represented as a row in the table. You can see the annotation
evidence for calling it a repeat by clicking on the ± and observe the sequence
you inputed aligned to known ALU’s from the database.
It may be most convenient to use Excel to analyze the data. For this you can
simply copy the entire webpage onto a “Notepad”, saving it, and then opening it
in Excel.
Make separate histograms of the % divergence
for “SINE/Alu” “SINE/MIR” “LINE/L1” and “LINE/L2”
repeats, where the x-axis is in units of 5% intervals, and the y-axis indicates
the frequency of the element at that 5% interval. As we discussed in class the %divergence
can be a proxy for time. Describe what this “repeat archaeology” reveals about
the repeats in your 90kb region.