## STAT536 : HW9Questions

Referers: 2008OldNews :: Fall2008 :: (Remote :: Orphans :: Tree )
Dorman Wiki
Dorman Lab Wiki
##### 0Question 1
• I am confused on how to compute fixation probabilities using the branching process approximation.  I'm not sure which formulas I'm supposed to be using and am especially having trouble relating the fixation probabilities to allele frequencies.

Formula:  There is one exact implicit formula, given for the case where Aa has fitness 1+s and the WF model is assumed.  I did forget to say you can assume the Wright-Fisher model.

Relating to allele frequencies:  Upon reread, the question lacks some clarity (I have re-posted the questions).  The plots should each show two lines, the differences between each of methods (1) and (2) and method (3) plotted against starting allele frequencies between 0 and 1.  Since the population is of size 1000, the starting allele frequency can be 0/1000, 1/1000, 2/1000, ..., or 1000/1000.  Relating starting allele frequency to the branching process extinction probability is probably the biggest "trick" (not serious) of the problem, and the answer is in the notes.

• Also, for the diffusion approximation for multiplicative selection, you say to use s with partial dominance and h=0.01.  Does this mean that wAA=(1+hs)^2, wAa=(1+hs), and waa=1, or is the h just for the diffusion approximation for dominant selection (with wAA=1+s, wAa=1+hs, waa=1)?

The multiplicative approximation disregards the actual selection on the AA genotype and replaces it with the multiplicative selection assumption.  Selection on the heterozygote remains at the truth.

• For multiplicative selection, do you want us to use U(p)=(1-exp(-4Nsp))/(1-exp(-4Ns)) [or s=hs depending on your answer about the relative fitnesses] or do you want us to use the approximation for small 4Ns [or 4Nhs] where U(p)=p+2Nsp(1-p)?

No need to put in further approximations since the calculation is not difficult.  The approximation was for easier human interpretation, but you could explore its effect.

• Lastly for question 1, do we assume Ne=N for the diffusion approximation for dominant selection or do we need to calculate Ne?

You have no information to calculate Ne.  Yes, assume Ne=N, which would have been clear if I had stated that you could assume Wright-Fisher.

##### 0Question 2
• I can't find any matching RS numbers

I must have been imagining things when I thought there was overlap between CAPN10 SNPs and your dataset.  I no longer find such overlap.  Replace the CAPN10 gene with the CARD8, which has been linked to rheumatoid arthritis and Crohn's disease.  The other two genes DO have RS overlaps with your dataset.

Also, to facilitate the analysis, dbSNP  is back online (it was down while I wrote the homework) and significantly easier to use.  In particular, the following three files identify the rs numbers of SNPs associated with the three genes.  You should look for overlap between these files and the rs numbers in your dataset.
 Gene Name Gene ID File CARD8 22900 gene22900.txt PER3 8863 gene8863.txt HIVEP3 59269 gene59269.txt

• I think I might be oversimplifying it, but in the Mutation and Drift section of the notes, you mention that 2Nu new mutant alleles are introduced each generation and each have probability 1/2N of fixing,  therefore 2Nu/2N=u is the probability of fixation.  So we can  estimate the mutation rate by the proportion of alleles that have fixed in our datasets (or the proportion of allele proportions that are 1).  Then similarly, 2Nv alleles experience reverse mutation, so we can estimate that by the proportion of allele proportions that are 0.  Is this an oversimplification of the problem?  Am I on the right track?

Frankly, I was waiting for a question on this one, but I wanted you all to think.  (If you haven't thought yet, please consider doing so, before reading further.)

Yes, I think this would work in principle, but you would only use the extreme tails of the p distribution for your estimate, where there may be very little data.  Under the drift/migration model, there is a predicted equilibrium distribution.  Estimation based on that distribution can allow you to use ALL the data to estimate u and v.  I'm being vague, so keep thinking, and asking...

Note, you may not be able to estimate $u$ and $v$ absolutely, rather only up to a multiplicative constant.