STAT536 : HW9Questions?

Referers: 2008OldNews :: Fall2008 :: (Remote :: Orphans :: Tree )

Dorman Wiki
Dorman Lab Wiki
0Question 1

Formula:  There is one exact implicit formula, given for the case where Aa has fitness 1+s and the WF model is assumed.  I did forget to say you can assume the Wright-Fisher model.

Relating to allele frequencies:  Upon reread, the question lacks some clarity (I have re-posted the questions).  The plots should each show two lines, the differences between each of methods (1) and (2) and method (3) plotted against starting allele frequencies between 0 and 1.  Since the population is of size 1000, the starting allele frequency can be 0/1000, 1/1000, 2/1000, ..., or 1000/1000.  Relating starting allele frequency to the branching process extinction probability is probably the biggest "trick" (not serious) of the problem, and the answer is in the notes.

The multiplicative approximation disregards the actual selection on the AA genotype and replaces it with the multiplicative selection assumption.  Selection on the heterozygote remains at the truth.

No need to put in further approximations since the calculation is not difficult.  The approximation was for easier human interpretation, but you could explore its effect.

You have no information to calculate Ne.  Yes, assume Ne=N, which would have been clear if I had stated that you could assume Wright-Fisher.

0Question 2

I must have been imagining things when I thought there was overlap between CAPN10 SNPs and your dataset.  I no longer find such overlap.  Replace the CAPN10 gene with the CARD8, which has been linked to rheumatoid arthritis and Crohn's disease.  The other two genes DO have RS overlaps with your dataset.

Also, to facilitate the analysis, dbSNP external link is back online (it was down while I wrote the homework) and significantly easier to use.  In particular, the following three files identify the rs numbers of SNPs associated with the three genes.  You should look for overlap between these files and the rs numbers in your dataset.
Gene Name Gene ID File
CARD8 22900 gene22900.txt
PER3 8863 gene8863.txt
HIVEP3 59269 gene59269.txt

Frankly, I was waiting for a question on this one, but I wanted you all to think.  (If you haven't thought yet, please consider doing so, before reading further.)

Yes, I think this would work in principle, but you would only use the extreme tails of the p distribution for your estimate, where there may be very little data.  Under the drift/migration model, there is a predicted equilibrium distribution.  Estimation based on that distribution can allow you to use ALL the data to estimate u and v.  I'm being vague, so keep thinking, and asking...

Note, you may not be able to estimate u and v absolutely, rather only up to a multiplicative constant.
There is one comment on this page. [Display comments/form]