Probability Density Functions
A random variable takes on values that have specific probabilities of occurring. An example of a random variable would be the number of car accidents per year among sixteen year olds. If we know how random variables are distributed in a population, then we may have an idea of how rare an observation may be. This information is then useful for making inferences (or drawing conclusions) about the population of random variables by using sample data.
Example Random Variable: How often sixteen year olds are involved in auto accidents in a year’s time.
Application: We could look at a sample of data consisting of 1,000 sixteen year olds in the Midwest and make inferences or draw conclusions about the population consisting of all sixteen year olds in the Midwest.
In summary, it is important to be able to specify how a random variable is distributed. It enables us to gauge how rare an observation (or sample) is and then gives us ground to make predictions, or inferences about the population.
Random variables can be discrete, that is observed in whole units as in counting numbers 1,2,3,4 etc. Random variables may also be continuous. In this case random variables can take on an infinite number of values. An example would be crop yields. Yields can be measured in bushels down to a fraction or decimal. The distributions of discrete random variables can be presented in tabular form or with histograms. Probability is represented by the area of a ‘rectangle’ in a histogram
Distributions for continuous random variables cannot be represented in tabular format due to their characteristic of taking on an infinite number of values. They are better represented by a smooth curve defined by a function. This function is referred to as a probability density function or p.d.f.
The p.d.f. gives the probability that a random variable ‘X’ takes on values in a narrow interval ‘dx’. This probability is equivalent to the area under the (p.d.f.) curve. This area can be described by the cumulative density function c.d.f. This can be obtained through integration of the p.d.f. over a given range.
Let f(x) be a p.d.f.
P( a <= X <= b ) = a∫b f(x) dx = F ( x )
This can be interpreted to mean that the probability that X is between the values of ‘a’ and ‘b’ is given by the integral of the p.d.f from ‘a’ to ‘b.’ This value can be given by the c.d.f. which is F ( x ). Those familiar with calculus know that F(x) is the anti-derivative of f(x).
f(x)= 1 / (2π σ ) -1/2 e –1/2 (x - u )^2/ sigma2
where X~ N ( u, σ2 )
In the beginning of this section I stated that it was important to be able to specify how a random variable is distributed. Through experience, statisticians have found that they are justified in modeling many random variables with these p.d.f.’s. Therefore, in many cases one can be justified in using one of these p.d.f.’s to determine how rare a sample observation is, and then to make inferences about the population. This is in fact what takes place when you look up values from the tables in the back of statistics textbooks.
The expected value E(X) for a discrete random variable can be determined by adding the products of individual observations Xi multiplied by the probability of observing each Xi .The expected value of an observation is the mean of a distribution of observations. It can be thought of conceptually as an average or weighted mean.
Example: given the p.d.f. for the discrete random variable X in tabular format.
Xi : 1 2 3
P(x =xi) .25 .50 .25
E (X) = SUM Xi P(x =xi) = 1 * .25 + 2*.50 + 3*.25 = 2.0
...to be conitued