## STAT430 : ProjectOne

Referers: Fall2007 :: (Remote :: Orphans :: Tree )
Dorman Wiki
Dorman Lab Wiki
##### 0Estimating a Change Point

See the actual conditions underlying YOUR data.

There was a question about the negative values in the data.  They are not a mistake.  You can take the reported values as transformed to some scale that includes negative values.

Suppose you measure network speed at $n$ equally spaced timepoints during some testing interval (retrieve your personalized network speed data using the link above).  The network speed $X i$ measured at time $i = 1 , ... , n$ is a random variable.  First, we will assume the $X i$ are normally distributed.  Your objective is to test whether there is a point in time where the network speed distribution changes.  The change can be in either the population mean or variance or both.  Conditional on knowing the temporal location of this change point, you may assume the random variables $X i$ are independent.  You may (and should) use R to assist you with this project.  Don't forget to ask questions.  I expect this problem to provoke thinking AND questions.  For example, test your ideas before you do any work by emailing them to me.
1. Can you find evidence that there is a change point in your data?  If so, what population parameters change across the change point? (Assume there is at most one change point in your data.)
2. Is the data consistent with normally distributed network speeds?

Suppose I find a change point in problem1, do I have to do the normality test based on two group sample data, one is from begin to the change point, the other is from change point+1 to end? Or we only have to do the normality test based on the whole data? Thanks

If you found a change point and you tested the full dataset, it should look non-normal because the sample is not iid from the same normal distribution.  Yes, you need to consider the two samples separately. [KSD]

1. In this question and the next, perform the analysis regardless of whether you found evidence of a change point.  Let the random variable $T$ be the time of the change point.  Assuming normality (regardless of your response to (2)), obtain a maximum likelihood estimate of $T$.  (Hint: Look at the R tutorial additional exercises.)
2. Suppose that the mean of the normal distributions is determined by the number of users connected to the network, so that more users translates to slower network speeds.  You learn that an online, collaborative meeting was initiated or ended sometime during the time you were collecting data.  During this meeting, all participants logged on to the network but were logged off before and after the meeting.  Write down a likelihood for the observed data $X i , i = 1 , ... , n$ if the number of users connected before and after the change point $T$ is Poisson distributed with means $λ 1$ and $λ 2$, respectively.  If network speeds are assumed inversely proportional to the number of users, can you estimate what proportion of total users attended the meeting?

Changed "directly proportional" to "inversely proportional" because network speed decreases with the number of users.  Also changed the $E [ X ]$ equation below.

For part 4, please start by defining all parameters and random variables in the problem.  Then, remember your conditional probability rule!

Please assume the number of users on the network changed only once during the sampling period, at the change point.

Assume that the proportionality relationship relating network speeds to number of users is constant, regardless of the number of users, so $E [ X ] = α$ over the (number of users) for some constant $α$

Do not use the likelihood you write down to estimate the proportion of users.  Use the simple estimates from parts 1 through 3.

A biological interpretation:  A biological scenario similar to this problem could be as follows.  Suppose you measure a continuous phenotype on a cell at multiple time points.  You wish to test whether the state of the cell changed at some point during the experiment, so that measurements taken from before and after the change point are distributed as distinct normal random variables.  Suppose the mean phenotype of the cell is determined by the number of receptors (or neural connections, ligands bound, etc.--take your pick).  Can you estimate the magnitude of the change in the number of receptors experienced by the cell at the estimated change point?