This is a question that puzzled me. After going through several undergraduate courses in statistical inference, and a rigorous theoretical treatment in graduate school, I felt at a loss. Suddenly, (so I thought) I was dealing with population data, not sample data. Should I eschew the use of confidence intervals and levels of significance? Do calculated differences between groups using population data represent 'the' difference between groups?
From a post entitled “How does statistical analysis differ when analyzing the entire population rather than a sample? professor Andrew Gelman states:
“So, one way of framing the problem is to think of your "entire population" as a sample from a larger population, potentially including future cases. Another frame is to think of there being an underlying probability model. If you're trying to understand the factors that predict case outcomes, then the implicit full model includes unobserved factors (related to the notorious "error term") that contribute to the outcome. If you set up a model including a probability distribution for these unobserved outcomes, standard errors will emerge.”
No comments:
Post a Comment