__Ordinary Least Squares:__

OLS: y = XB + e

Minimizes the sum of squared residuals e'e where e = (y –XB)

R

^{2}= 1- SSE/SST

OLS With a Dichotomous Dependent Variable: y = (0 or 1)

__Dichotomous Variables, Expected Value, & Probability:__

Linear Regression E[y|X] = XB ‘conditional mean(y) given x‘

If y = { 0 or 1}

E[y|X] = P

_{i}a probability interpretation

Expected Value: sum of products of each possible value a variable can take * the probability of that value occurring.

If P(y=1) = p

_{i}and P(y=0) = (1-p

_{i}) then E[y] = 1*p

_{i}+0* (1-p

_{i}) = p

_{i}→ the probability that y=1

Problems with OLS:

1) Estimated probabilities outside (0,1)

2) e~binomial var(e) = n*p*(1-p) violates assumption of uniform variance → unreliable inferences

__Logit Model:__

Ln ((D

_{i}/(1 – D

_{i})) = βX

D

_{i}= probability y = 1 = e

^{Xβ}/ ( 1 + e

^{Xβ})

Where : D

_{i}/ (1 – D

_{i}) = ’odds’

E[y|X] = Prob(y = 1|X) = p = e

^{Xβ}/ (1 + e

^{Xβ})

__Maximum Likelihood Estimation__

L(θ) =∏f(y,θ) -the product of marginal densities

Take ln of both sides, choose θ to maximize, → θ* (MLE)

Choose θ’s to maximize the likelihood of the sample being observed.Maximizes the likelihood that data comes from a ‘real world’ characterized by one set of θ’s vs another.

__Estimating a Logit Model Using Maximum Likelihood:__

L(β) = ∏f(y, β) = ∏ e

^{Xβ}/ (1 + e

^{Xβ}) ∏ 1/(1 + e

^{Xβ})

choose β to maximize the ln(L(β)) to get the MLE estimator β*

To get p(y=1) apply the formula Prob( y = 1|X) = p = e

^{Xβ*}/ (1 + e

^{Xβ*}) utilizing the MLE estimator β*to'score' the data X.

__Deriving Odds Ratios:__

Exponentiation ( e

^{β}) gives the odds ratio.

**Variance:**When we undertake MLE we typically maximize the log of the likelihood function as follows:

Max Log(L(β)) or LL ‘log likelihood’ or solve:

∂Log(L(β))/∂β = 0 'score matrix' = u( β)

-∂u(β)/ ∂β 'information matrix' = I(β)

I

^{-1}(β) 'variance-covariance matrix' = cramer rao lower bound

**Inference:**Wald χ

^{2}= (β^{MLE}-β_{0})Var^{-1}(β^{MLE}-β_{0})__Assessing Model Fit and Predictive Ability:__

Not minimizing sums of squares: R

^{2}= 1 – SSE / SST or SSR/SST. With MLE no sums of squares are produced and no direct measure of R

^{2}is possible. Other measures must be used to assess model performance:

**Deviance:**-2 LL where LL = log-likelihood (smaller is better)

**-2[LL**

_{0}- LL

_{1}] L

_{0}= likelihood of incomplete model L

_{1}= likelihood of more complete model

AIC and SC are deviants of -2LL, and penalize the LL by the # of predictors in the model

**Null Deviance:**D

_{N}= -2[LL

_{N}- LL

_{p}] L

_{N}= intercept only model L

_{p}= perfect model ~ SST

**Model Deviance:**D

_{K}= -2[LL

_{K}- LL

_{p}] L

_{K}= intercept only model L

_{p}= perfect model ~ SSR

**Model χ**D

^{2}:_{N}-D

_{K}For a good fitting model, model deviance will be smaller than null deviance, giving a larger χ

^{2}and a higher level of significance.

**Pseudo-r-square:**D

_{N}-D

_{K}/ D

_{N}Smaller (better fitting) D

_{K}gives a larger ratio. Not on (0,1)

**Cox & Snell Pseudo R square:**adjusts for parameters and sample size, not on (0,1)

**Nagelkerke (Max-rescaled r-square) :**transformation such that R --> (0,1)

**Other:**

**Percentage of Correct Predictions**

**Area under the ROC curve:**

Area = measure of model’s ability to correctly distinguish cases where (y=1) from those that do not based on explanatory variables.

y-axis: sensitivity or prediction that y = 1 when y = 1,

x-axis: 1-specificity or prediction that y = 1 when y = 0, false positive

References:

Menard, Applied Logistic Regression Analysis, 2nd Edition 2002

This is very clear and very helpful, thank you.

ReplyDeleteHi, I would like to understand more on the logistics function.

ReplyDeleteHow to apply it if I want to predict the mobile telephony market, i.e. I only have the data for number of subscribers.

Thank you!