## STAT430 : Questions

Referers: Fall2007 :: (Remote :: Orphans :: Tree )
Dorman Wiki
Dorman Lab Wiki
This is an old revision of Questions from 2007-11-07 15:28:02.
##### 0Questions
• After going through the notes again I am still a little fuzzy on the difference between confidence interval and prediction interval.  Is the confidence interval the 1-$α$ range of E[y] while the prediction interval is the 1-$α$ range for y itself?

Yes.  Confidence intervals are for localizing population parameters (nonrandom quantities that you are trying to estimate).  Prediction intervals are for localizing random variables, which presumably you know something about (e.g. you've estimated their sampling distribution), otherwise it would be pointless

• So, a residual is the observed value minus the predicted value, but is this the only definition of residual?  If it is, it then seems like SST should be the sum of squares of residuals.  Is this so?  By the notation we have used I could also justify it being SSR.  I'm just a little turned around about this at the moment...  This could clarify the meaning of MSR if it is them SSR/k.  Then it's the mean squared residual?

SSR is the sum of squared residuals.  SST is the numerator of a standard sample variance, i.e. the difference between the observations and their population mean.  SSR is similar but the sample mean is replaced with the prediction value, obtained by making assumptions about the relationships of the sample means across levels.

• In the MLR assumptions we state that cov($ε i$,$ε j$) = 0 for all i not equal to j and that this holds for random samples, but not necessarily for time series data or repeated measures on an individual.  However, it seems like the majority of studies are one of the latter two.  Is our work in experimental design teaching us how to satisfy this assumption by blocking and such so that our data is of a proper form for MLR analysis?  Is it that we simply violate this assumption some times?  Or, am I thinking of studies in the wrong way; possibly the same mistake of language as thinking of a random variable in the same way as a random number generator?

Perhaps a lot of the data in the field you best know is of this form.  Not all fields are so restricted.  Indeed people will often ignore this dependence and continue with the analysis as if it were not an issue.  One should be cautioned against this approach.  Experimental design does not solve this problem.

For example, suppose you wanted to test lack of fit to some kind of regression model.  To run the test, you need multiple measures at the same level.  One way is to measure the same individual multiple times, but correlated data could result.  It would be much better to collect data from multiple individuals with the same covariates.  Of course, with continuous covariates, there may be no matching individuals.  Maybe it is time to take a good course on time series analysis to help you model the resulting correlation.

• I have been reviewing the special pmf's and am a little confused by E[X] for Geometric (packet 1, p.29).  $p X ( x )$ makes sense, as do the results of the expectancy and variance, but the sum gets me.  If we went over this I apologize, but what does the leading $i$ represent?  If we expand is it to show that in order to get to $i$ we must have the $i -1 , i -2 , ... , i - z +1$ failures before the $z$th trial succeeds?

Yes, there is a typo in the formula for $E [ X ]$.  Here is corrected formula plus detailed derivation.

• There appears to be a typo in the derivation of E[X] for the Poisson distribution (packet 1, p.31).  The x that is factored out and canceled with the leading x in x! to make the denominator (x-1)! reappears in the next line.  It shouldn't.

Agreed.  My notes for derivation of $E [ X ]$ read simpler.

• We know that if we have a Poisson distribution E[X] = Var[X].  Is this an iff statement?  That is, if we have that E[X] = Var[X] are we guaranteed that our random variable is described by the Poisson distribution?

It is not an iff statement.  One must have all moments  match (when they exist) with those of a known distribution to conclude that a random variable has this distribution.  See moment generating function .  Thus, we would also have to check higher moments, like $E [ X 3 ]$, match those of a Poisson random variable to conclude that $X ∼$ Poisson.

• I am having difficulty with the definition of $Ω$.  Initially we defined $Ω$ as the set of all possible outcomes (for probability).  However, when we defined a random variable we said that a random variable $X$ was defined as a mapping from $Ω$ onto $R$.  That, to me, says that $Ω$ is the domain and $R$ the codomain.  These seem contradictory.  Is this because one is a definition for probabilities and the other for statistics, or is there something I am misinterpreting?   This question was prompted because I have in my notes that $Ω X$ is the range of $X$.  Assuming that to be true I was then thinking that a transformation of two random variables $X$,$Y$ is like a composition function; it is a mapping from $R$, through $Ω X$, to $Ω Y$ that is surjective but not injective.  The correctness of this is obviously dependent on the definition of these $Ω$ sets...

Yes, this is abusive notation.  So, $Ω$ is the sample space consisting of all possible outcomes of a random experiment.  A random variable maps $Ω$ to some subspace of $R$.  If we sort of forget about the random experiment and outcomes, and treat the random variable as the outcome, then we can call this $R$ subspace $Ω X$.  Proper, careful notation would probably use something other than $Ω$ for this purpose.

• In the chi-square test of independence, why is the degrees of freedom $( n r - 1 ) ( n c -1 )$ where $n r$ is the number of rows and $n c$ is the number of columns.

As per our discussion about goodness-of-fit tests, the degrees of freedom should be $m -1$ less the number of parameters estimated, where $m$ is the number of categories.  In the test of independence, the number of categories is $n r n c$.  Under independence, there are $n r -1$ parameters to estimate for the marginal pmf on rows, one for each category minus the constraint that the pmf $∑ i p i = 1$ sums to one.  Similarly, there are $n c -1$ additional parameters to estimate for the pmf on columns.  Therefore, the number of degrees of freedom is $n r n c - ( n r -1 ) - ( n c -1 ) -1 = ( n r -1 ) ( n c -1 )$, in agreement with the rule for tests of independence.  In conclusion, the test of independence can be viewed as a special type of goodness-of-fit test.

• Is there any course schedule that we can know what we will learn during this semester? That would helpful for deciding which group project is suitable. Thanks.

There is only a vague course schedule.  We will cover rudimentary experimental design, multiple linear regression, general linear models, logistic regression, poisson regression, stochastic processes (Bernoulli, Poisson, Brownian, discrete time Markov chain), simulation, including random number generation, Monte Carlo integration, and MCMC.