Wednesday, March 16, 2011

Predictive Modeling and Custom Reporting with R

Previously I made a post that looked at different customer/patron/donor segments and how they differed over time in terms of predicted risks, which were based on a predictive model that I created. Below I will give brief introductory example of one such model implemented in R. The aim of the project is to predict admissions status, and create a report (that could be implemented in an enterprise wide system) that ranks individual probability of admissions with a simple 'red'= low probability of admissions, 'yellow' = marginal probability of admissions, 'green' = high probability of admissions. Note this example isn't the most practical, but more practical results could easily be obtained for any predictive outcome, customer purchase decisions, retention, success, etc.

The data in this example consists of graduate student application data provided from UCLA, with variables rank - indicates the rank of the school the applicant applied to, GRE - the GRE score of the student, GPA- the undergraduate GPA of the student. Additionally I added an unique ID (row) for each student applicant, used to help create the report. The variable 'admit' is the binary variable (0,1) that we are trying to predict.

One thing I do differently than the example given by UCLA, for demo purposes, I divide the data into training data to build the model, and validation data, or data that we use to score, the students we are trying to predict. (in practice, validation data is used to calibrate and evaluate models prior to deployment, and a final 'score' data set is used for predicting new people)

The model used in this example is a logistic regression model. After running the model,and getting the odds ratios, we get the following interesting result:

GPA : 2.41991974

This implies that for every 1 unit increase in GPA, the odds of being admitted increase by a factor of about 2.4 (for more on interpreting logistic regression co-efficients and odds ratios see my post here.) Odds ratios can be useful for measuring the impact of various variables (which could indicate customer segments, interventions, marketing campaigns etc.) and how they relate to probabilities of any outcome of interest.

By using the R function 'predict', new data can be read in and predictions can be made using the developed model. This will give a probability of admission for each student applicant, which can then be used to create an easily interpreted actionable report that can be used directly, refined in another program like excel, delivered via web, or incorporated into an enterprise wide reporting system.  The data can be merged by ID with other data sources, and various customized reports can be created utilizing the analytics provided by the model.

Example:


The R code used for this demonstration is below:

# *------------------------------------------------------------------
# | PROGRAM NAME: ex_logit_analytics_R
# | DATE: 3/14/11
# | CREATED BY: Matt Bogard  
# | PROJECT FILE:Desktop/R Programs            
# *----------------------------------------------------------------
# | PURPOSE: example of predictive model and reporting               
# |
# *------------------------------------------------------------------
# | COMMENTS:               
# |
# |  1: Reference: R Data Analysis Logistic Regression 
# |     http://www.ats.ucla.edu/stat/r/dae/logit.htm
# |  2: 
# |  3: 
# |*------------------------------------------------------------------
# | DATA USED: data downloaded from reference above             
# |
# |
# |*------------------------------------------------------------------
# | CONTENTS:               
# |
# |  PART 1: data partition 
# |  PART 2: build model
# |  PART 3: predictions/scoring
# |     PART 4: traffic lighting report
# *-----------------------------------------------------------------
# | UPDATES:               
# |
# |
# *------------------------------------------------------------------
 
# get data 
 
apps <- read.csv(url("http://www.ats.ucla.edu/stat/r/dae/binary.csv")) # read data 
 
 
names(apps) # list variables in this data set
dim(apps) # number of observations
print(apps) # view
 
# *------------------------------------------------------------------
# |                
# |    data partition
# |  
# |  
# *-----------------------------------------------------------------
 
 
# store total number of observations in your data
N <- 400 
print(N)
 
# Number of training observations
Ntrain <- N * 0.5
print(Ntrain)
 
# add an explicit row number variable for tracking
 
id <- seq(1,400)
 
apps2 <- cbind(apps,id)
 
# Randomly arrange the data and divide it into a training
# and test set.
 
dat <- apps2[sample(1:N),]
train <- dat[1:Ntrain,]
validate <- dat[(Ntrain+1):N,]
 
dim(dat)
dim(train)
dim(validate)
 
# sort and look at data sets to see that they are different
 
sort_train <- train[order(train$id),]
print(sort_train)
 
sort_val <- validate[order(validate$id),]
print(sort_val)
 
# *------------------------------------------------------------------
# |                
# |    build model
# |  
# |  
# *-----------------------------------------------------------------
 
# logit model 
 
admit_model<- glm(train$admit~train$gre+train$gpa+as.factor(train$rank), family=binomial(link="logit"), na.action=na.pass)
 
# model results
 
summary(admit_model)
 
# odds ratios
 
exp(admit_model$coefficients)
 
# *------------------------------------------------------------------
# |                
# |   predictions/scoring data
# |  
# |  
# *-----------------------------------------------------------------
 
train$score <-predict(admit_model,type="response") # add predictons to training data 
 
sort_train_score <- train[order(train$id),] # sort by observation
print(sort_train_score) # view
 
validate$score <-predict(admit_model,newdata=validate,type="response") # add predictions to validation data 
sort_val_score <- validate[order(validate$id),] # sort by observation
print(sort_val_score) # view
 
 
# *------------------------------------------------------------------
# |                
# |    create a 'traffic light report' based on predicted probabilities
# |  
# |  
# *----------------------------------------------------------------- 
 
summary(validate$score) # look at probability ranges
 
 
green <- validate[validate$score >=.6,] # subset most likley to be admitted group
dim(green)
green$colorcode <- "green" # add color code variable for this group
 
yellow <- validate[(validate$score < .6 & validate$score >.5),] # subset intermediate group
dim(yellow)
yellow$colorcode <-"yellow"  # add color code
 
red <- validate[validate$score <=.5,]  # subset least likely to be admitted group
dim(red)
red$colorcode <- "red" # add color code
 
# create distribution list/report
 
applicants_by_risk<- rbind(red,yellow, green)
dim(applicants_by_risk)
report<-applicants_by_risk[order(applicants_by_risk$id),] # sort by applicant id
print(report[c("id","colorcode", "score")]) # basic unformatted action report can be saved as a data set, and exported for other reports and formatting
Created by Pretty R at inside-R.org

Tuesday, March 15, 2011

Applied Anaytics with R and Venn Diagrams


For a particular client I developed a predictive model that scored a set of patrons or donors at different points of time, providing the predicted probability that they would stop making contributions. At each point in time, they were more and more experienced with the service and more data about the patron was collected. As a result the model’s predictive accuracy improved with time. The client wanted to know, looking at the same cohort of customers over time, how often were the same customers predicted to stop making donations. In other words, at t=1, when the model is weakest, how many customers predicted to stop contributions were also on the ‘list’ at say t=3 when the model is much more accurate?

To do this I used the 'limma' package from the 'bioconductor' R mirror. (see reference below and R code that follows)

Before attempting to construct the Venn diagram, I had to take the scored donor data set and subset it based on all those patrons ever indicated to be 'high risk.' Then I created a data set with one row per patron and a binary indicator tracking their movement from 'novice' to 'intermediate' to 'experienced.' (t=1,2,3 respectively)



The format for the data set is similar to the layout below:

ID NOVICE INTERMEDIATE EXPERIENCED
1 1 1 1
2 1 1 0
3 1 0 0
.  .  .  .
etc.

The resulting Venn Diagram is below:



Note, there were 50 patrons in this example data set, and only 12 of those were predicted to be 'high risk' every time as they moved across each experience category or time period. 15 were high risk 'novice' patrons but never became part of the 'intermediate' or 'experienced' segments.

References:

How can I generate a Venn diagram in R?
http://www.ats.ucla.edu/stat/r/faq/venn.htm

R code:

# *------------------------------------------------------------------
# | PROGRAM NAME: R_Venn
# | DATE: 3/15/11
# | CREATED BY: MATT BOGARD  
# | PROJECT FILE: stats blog        
# *----------------------------------------------------------------
# | PURPOSE: CREATE VENN DIAGRAMS FOR MEMBERSHIP IN MULTIPLE GROUPS              
# |
# *------------------------------------------------------------------
# | COMMENTS:               
# |
# |  1: REFERENCES: How can I generate a Venn diagram in R? 
# |     http://www.ats.ucla.edu/stat/r/faq/venn.htm
# | 
# |  2: 
# |  3: 
# |*------------------------------------------------------------------
# | DATA USED: data scored by predictive model  
# |
# |*------------------------------------------------------------------
# | CONTENTS:               
# |
# |  PART 1: Run UCLA example code for practice 
# |  PART 2: My data
# |  PART 3: 
# *-----------------------------------------------------------------
# | UPDATES:               
# |
# |
# *------------------------------------------------------------------
 
 
 
 
 rm(list=ls()) # get rid of any existing data 
 ls() # view open data sets
 
 
 
# for 1st time use- get source code for bioconductor limma library
 
 
source("http://www.bioconductor.org/biocLite.R")
 
 
biocLite("limma")
 
ls() # see what data is there
 
library(limma) # load package
 
# *------------------------------------------------------------------
# | Part 1: Run UCLA example code            
# *-----------------------------------------------------------------
 
 
# read data
 
hsb2<-read.table("http://www.ats.ucla.edu/stat/R/notes/hsb2.csv", sep=',', header=T)
 
fix(hsb2) # view data set 
 
# create column vectors to represent the data sets
 
hw<-(hsb2$write>=60)
hm<-(hsb2$math >=60)
hr<-(hsb2$read >=60)
c3<-cbind(hw, hm, hr)
 
# create the matrix that will be used to plot the venn diagram
a <- vennCounts(c3)
a
 
vennDiagram(a) # plot venn diagram
 
# *------------------------------------------------------------------
# | Part 2: My Data           
# *-----------------------------------------------------------------
 
 
setwd('/Users/wkuuser/Desktop/R Data Sets') # set working directory
 
 
list<- read.csv("CUSTOMER_LOYALTY.csv", na.strings=c(".", "NA", "", "?"), encoding="UTF-8") # read data
 
fix(list) # view data set
 
names(list) # get variable names (for cutting and pasting below)
 
# look at summary statistics for each data group
 
library(Hmisc) # for describe function
 
novice <-list[(list$NOVICE==1),] #subset novice segment
describe(novice) # n =37
 
intermediate <- list[(list$INTERMEDIATE==1),] # subset intermediate segment
describe(intermediate) # n=25
 
experienced <- list[(list$EXPERIENCED==1),] # subset experienced segment
describe(experienced)  # n= 25
 
# format data for use in venn diagram function below
 
l <- list[c("NOVICE","INTERMEDIATE","EXPERIENCED"  )] # keep only indicator variables
 
l3 <- as.matrix(l) # convert to a matrix # save as matrix
 
a <- vennCounts(l3) # create counts for venn digram
a
 
# plot venn digram 
vennDiagram(a, include = "both", names = c("Novice (n =37)", "Intermediate (n=25)", "Experienced (n=25)"), cex = 1, counts.col = "blue")
title("Donors Likely to Stop Contributions by Experience")
Created by Pretty R at inside-R.org

Friday, March 11, 2011

Plotting Indifference Curves with R Contour Function

The following post at Constructing Difference Curves - Part 3 from economics.about.com provides a discussion on indifference curves (but actually I think they are isoquants) and how to construct them. I think I have a grasp on how to do this in R if you define the utility function as z =  √(x*y) . For the x and y data I used modified data from the article mentioned above, and then plotted some simulated data.

The data I used for x and y (which I modified-see documentation in the R-code) is listed below:

x: 1,2,3,4,5,6,7,8
y: 10,10,10,15,15,30,60,90

The R code below reads in the data and plots level sets or indifference curves for both the data above and the simulated data. This very basic, for full documentation ( and options for x,y limits, levels etc.) of the contour function in R see here.

Edit: for the second part of the program, if I switch the values of x and y I get indifference curves that are convex to the origin as they should be- the correction can be also represented much more compactly as:

contour(
xyT<-t(as.matrix(cbind(x<-seq(0,20,by=2),y<-seq(0,100,by=10)))),
z<-as.matrix(sqrt(x*y))
)

Created by Pretty R at inside-R.org


See program below:

# *------------------------------------------------------------------
# | PROGRAM NAME: INDIFFERENCE_CURVES_R
# | DATE: 3/11/11
# | CREATED BY:  MATT BOGARD
# | PROJECT FILE:  P:\R  Code References            
# *----------------------------------------------------------------
# | PURPOSE: MORE TO PLOT LEVEL SETS USING THE CONTOUR FUNCTION THAN              
# |          TO ACTUALLY PLOT INDIFFERENCE CURVES 
# *------------------------------------------------------------------
# | COMMENTS:               
# |
# |  1: references: http://stat.ethz.ch/R-manual/R-devel/library/graphics/html/contour.html 
# |  2: 
# |  3: 
# |*------------------------------------------------------------------
# | DATA USED: http://economics.about.com/od/indifferencecurves/a/constructing3.htm              
# |
# |
# |*------------------------------------------------------------------
# | CONTENTS:               
# |
# |  PART 1: explicit data 
# |  PART 2: simulated data  (sort of) - not really indifference curves
# |  PART 3: 
# *-----------------------------------------------------------------
# | UPDATES:               
# |
# |
# *------------------------------------------------------------------
 
# *------------------------------------------------------------------
# | PART 1:   explicit data            
# *-----------------------------------------------------------------
 
 
 
# raw data 
 
x <- c(1,2,3,4,5,6,7,8)
y <- c(10,10,10,15,15,30,60,90)
 
# note- for the contour function to work below, x and y 
# These must be in ascending order-based on the contour documentation
#  which is the opposite of how the data was presented at
# about.com 
 
# put x and y in a matrix
 
xy <- as.matrix(cbind(x,y))
 
# transpose xy
 
xyT <- t(xy) 
 
# define function z as a matrix
 
z <- as.matrix(sqrt(x*y))
 
# plot the countour plot / specified level sets
 
contour(xyT,z) # all levels
 
contour(xyT,z, levels =c(10,20,50)) # specified levels for z
 
# *------------------------------------------------------------------
# | PART 2:   simulated data  (sort of) - not really indifference curves            
# *-----------------------------------------------------------------
 
 
rm(list=ls()) # get rid of any existing data 
 
ls() # display active data -should be null 
 
# define x and y
 
x <- seq(0,100,by=10)
y <- seq(0,20,by=2)
 
# put x and y in a matrix
 
xy <- as.matrix(cbind(x,y))
 
# transpose xy
 
xyT <- t(xy) 
 
# define function z as a matrix
 
z <- as.matrix(sqrt(x*y))
 
contour(xyT,z)
Created by Pretty R at inside-R.org

Thursday, March 10, 2011

Copula Functions, R, and the Financial Crisis

From: In defense of the Gaussian copula, The Economist

"The Gaussian copula provided a convenient way to describe a relationship that held under particular conditions. But it was fed data that reflected a period when housing prices were not correlated to the extent that they turned out to be when the housing bubble popped."

Decisions about risk, leverage, and asset prices would very likely become more correlated in an environment of centrally planned interest rates than under 'normal' conditions.

Simulations using copulas can be implemented in R. I'm not an expert in this, but thanks to the reference Enjoy the Joy of Copulas: With a Package copula I have at least gained a better understanding of copulas.

A copula can be defined as a multivariate distribution with marginals that are uniform over the unit interval (0,1).  Copula functions can be used to simulate a dependence structure independently from the marginal distributions.

Based on Sklar's theorem the multivariate distribution F can be represented by copula C as follows:

F(x1…xp) = C{ F1(x1),…, Fp(xp)} 

where  each Fi(xi) is a uniform marginal distribution. 

Various functional forms for copula functions exist, each based on different assumptions about the dependence structure of the underlying variables. Using R, I have simulated data based on 3 different copula formulations (for 2 variable cases) and produced scatter plots for each.




A mentioned in the Economist article at the beginning of this post, the gaussian copula was used widely before the housing crisis to simulate the dependence between housing prices in various geographic areas of the country. Looking at the plot for the gaussian copula above, it can be seen that extreme events (very high values of X1 and X2 or very low values of X1 and X2) seem very weakly correlated. The dependencies between X1 and X2 are very weak.  We can see that with the Gumbel copula, extreme events (very high values of g1 and g2) are more correlated, while with the Clayton copula, extreme events (very low y1 and y2) are more correlated.

How might this relate to the financial crisis specifically?  In The Role of Copulas in the Housing Crisis, Zimmer states:

"Due to its simplicity and familiarity, the Gaussian copula is popular in the calculation of risk in collaterized debt obligations. But the Gaussian copula imposes asymptotic independence such that extreme events appear to be unrelated. This restriction might be innocuous in normal times, but during extreme events, such as the housing crisis, the Gaussian copula might be inappropriate" 

In the paper a mixture of Clayton and Gumbel copulas is investigated as an alternative to using the Gaussian copula.

The R code that I used to create the plots above, in addition to some additional plots, can be found below. I also highly recommend JD Long's blog post Stochastic Simulation with Copula Functions in R.

References:

The Role of Copulas in the Housing Crisis. The Review of Economics and Statistics. Accepted for publication. Posted Online December 8, 2010. David M. Zimmer. Western Kentucky University.

Enjoy the Joy of Copulas: With a Package copula. Journal of Statistical Software Oct 2007, vol 21 Issue 1.

R-CODE

# *------------------------------------------------------------------
# | PROGRAM NAME: R_COPULA_BASIC
# | DATE: 1/25/11
# | CREATED BY: Matt Bogard 
# | PROJECT FILE: P:\R  Code References\SIMULATION             
# *----------------------------------------------------------------
# | PURPOSE:    copula graphics           
# | 
# *------------------------------------------------------------------
# | COMMENTS:               
# |
# |  1: REFERENCES:  Emjoy the Joy of Copulas: With a Package copula
# |     Journal of Statistical Software Oct 2007, vol 21 Issue 1
# |      http://www.jstatsoft.org/v21/i04/paper 
# |  2: 
# |  3: 
# |*------------------------------------------------------------------
# | DATA USED:               
# |
# |
# |*------------------------------------------------------------------
# | CONTENTS:               
# |
# |  PART 1:  
# |  PART 2: 
# |  PART 3: 
# *-----------------------------------------------------------------
# | UPDATES:               
# |
# |
# *------------------------------------------------------------------
 
 
library("copula")
set.seed(1)
 
# *------------------------------------------------------------------
# |                
# |scatterplots
# |  
# |  
# *-----------------------------------------------------------------
 
 
# normal (Gausian?) Copula
 
norm.cop <- normalCopula(2, dim =3)
norm.cop
x <- rcopula(norm.cop, 500)
plot(x)
title("Gaussian Copula")
 
# Clayton Copula
 
clayton.cop <- claytonCopula(2, dim = 2)
clayton.cop
y <- rcopula(clayton.cop,500)
plot(y)
title("Clayton Copula")
 
# Frank Copula
 
frank.cop <- frankCopula(2, dim = 2)
frank.cop
f <- rcopula(frank.cop,500)
plot(f)
title("Frank Copula")
 
 
# Gumbel Copula
 
gumbel.cop <- gumbelCopula(2, dim = 2)
gumbel.cop
g <- rcopula(gumbel.cop,500) 
plot(g)
title('Gumbel Copula')
 
# *------------------------------------------------------------------
# |                
# |  contour plots 
# |  
# |  
# *-----------------------------------------------------------------
 
 
# clayton copula contour
myMvd1 <- mvdc(copula = archmCopula(family = "clayton", param = 2),
margins = c("norm", "norm"), paramMargins = list(list(mean = 0,
sd = 1), list(mean = 0, sd = 1)))
 
contour(myMvd1, dmvdc, xlim = c(-3, 3), ylim = c(-3, 3))
title("Clayton Copula")
 
# frank copula contour
myMvd2 <- mvdc(copula = archmCopula(family = "frank", param = 5.736),
margins = c("norm", "norm"), paramMargins = list(list(mean = 0,
sd = 1), list(mean = 0, sd = 1)))
 
contour(myMvd2, dmvdc, xlim = c(-3, 3), ylim = c(-3, 3))
title("Frank Copula")
 
# gumbel copula 
myMvd3 <- mvdc(copula = archmCopula(family = "gumbel", param = 2),
margins = c("norm", "norm"), paramMargins = list(list(mean = 0,
sd = 1), list(mean = 0, sd = 1)))
 
contour(myMvd3, dmvdc, xlim = c(-3, 3), ylim = c(-3, 3))
title("Gumbel Copula")
Created by Pretty R at inside-R.org

The Calculation and Interpretation of Odds Ratios


(see also- more on deriving odds ratios


DATA

Assume some event with outcomes Y or N and two groups males vs females and the following historical data:

 
PROBABILITIES

Based on the data above, we can calculate the probabilities for Males and Females to have each outcome Y and N.

 


We can see that the probability that Males have event Y is greater than the probability for Females.

ODDS

From the probabilities we can calculate the odds of event Y for each group. 


ODDS RATIOS

From the odds we can calculate the odds ratios for event Y, which give a comparison of the relative odds for each group M and F. 


  INTERPRETATION

From the basic probabilities above, we know that the probability of event Y is greater for males than females. The odds of event Y are also greater for males than females. These relationships are also reflected in the odds ratios. The odds of event Y for males is 3 times the odds of females. The odds of event Y for females are only .33 times the odds of males. In other words, the odds of event Y for males are greater and the odds of event Y for females is less.

This can also be seen from the formula for odds ratios. If the OR M vs F  = odds(M)/odds(F), we can see that if the odds (M) > odds(F), the odds ratio will be greater than 1. Alternatively, for OR  F vs M = odds(F)/odds(M), we can see that if the odds(F) < odds(M) then the ratio will be less than 1.  If the odds for both groups are equal, the odds ratio will be 1 exactly.

Three basic guidelines for interpreting odds ratios follow:
Interpretation of the odds ratios above tells us that the odds of Y for females are less than the odds of males. It might be more informative if we could say how much less. To calculate this we need to go back to the raw odds calculations above.  In terms of percentages, the odds (Y) for females are [(.33-.11)/.33]*100 = -67%  or 67% less than the odds(Y)  for males.  This can be also be calculated more easily and directly from the odds ratio of F vs.M.

The odds(Y) for females are  [OR(F vs. M) – 1]*100 = [.33-1]*100 = -67% less than males.  A positive (vs. negative) result would imply an increased % of odds.  

RELATION TO LOGISTIC REGRESSION

 
Odds ratios can be obtained from logistic regression by exponentiating the coefficient or beta for a given explanatory variable.  For categorical variables, the odds ratios are interpreted as above. For continuous variables, odds ratios are in terms of changes in odds as a result of a one-unit change in the variable.

Saturday, March 5, 2011

Student's t, Normality, & the Slutsky Theorems

 Often in an analysis we are using s2 instead of σ2 (as the true population variance is often unknown). Yet we want to construct confidence intervals, or equivalently conduct hypothesis tests. The t distribution is the ratio of a normally distributed variable and chi-square distributed variable ( DeGroot, 2002).  If our data is distributed exactly normal, we can rely on using the t-table for constructing confidence intervals. These are exact results, as the t-ratio is exactly distributed t given that the underlying data is distributed exactly normal. 

But what if we don’t know the distribution of the data we are working with, or don’t feel comfortable making assumptions of normality. Usually we have to estimate σ2 with s2. Of course if n is large enough, reliance on the t-table or the standard normal table will give similar results. But, Goldberger offers the following anecdote:

“ There is no good reason to rely routinely on a t-table rather than a normal table unless Y itself is normally distributed” ( Goldberger, 1991).

So, how do you justify this?   In this case there are some powerful theorems regarding asymptotic properties of sample statistics known as the Slutsky Theorems. These are outlined in Goldberger, 1991. The following sequence of steps using these theorems is based on Goldberger and my lecture notes from ECO 603 Research Methods for Economics, which was actually a mathematical statistics course taught by Dr. Christopher Bollinger at the University of Kentucky. Any errors or mistakes are completely my own.

GivenΘ^ is an estimator for the population parameter Θp  implies convergence in probability, and d implies convergence in distribution:

S1: If Θ^ p Θ  then for any continuous function h (Θ^)→ p h(Θ).

S2: If   Θ1^ and Θ2^  converge in probability to (Θ1, Θ2), then
 h (Θ1^, Θ2^) p  h(Θ1, Θ2).


S3: If Θ^ p Θ  and Zn  d N(0,1)  then (Θ^  +    Znd  N( Θ, 1 )

S4: If Θ^ p Θ  and Zn  d N(0,1)  then Θ^  Zn   d  N( 0, Θ2 )

S5: If n1/2 (Θ^ - Θ ) / s1/2 ~A N(0,1) then for continuous functions of Θ,

        n1/2 (h(Θ^) – h( Θ )) / s1/2 ~A N(0, h’(Θ)2Σ2)

( Goldberger, 1991).

So now if we want to use s2 to estimate  σ2 we form the statistic

Z^ = ( Xbar - μ)2 / (s2 / n)1/2  = n1/2  ( Xbar - μ)/ s    (1)

This looks like the t- statistic, but if we can’t make the assumption of normality, the exact results of the t-distribution do not apply. In this case we rely on results of both the CLT and Slutsky theorems.
Given the traditional  standardized normal variable formulation: 

 Z = n1/2  ( Xbar - μ) σ 

Algebraic manipulation shows that

( σ/ s)  n1/2  ( Xbar - μ)/ σ  = n1/2  ( Xbar - μ)/ s

then   ( σ/s)  Z = Z^  where Z^ is defined in (1) above

By the CLT,  Z d N( 0,1)

It can be shown that  s2 p  σ2

If we define Θ^ as  s then we can view  ( σ/s) as a function h (Θ^)

Then by S1  h (Θ^)→ p h(Θ)  which implies that ( σ/s) p ( σ/ σ) = 1

Given S1S4 gives the following result:  Θ^ Zn   d  N( 0, Θ2 ) which implies that
 ( σ/s)Z d  N( 0, ( σ/ σ)2 ) = N( 0,1)

And therefore Z^ d N(0,1).


Therefore, by the Central Limit and Slutsky theorems (S1 and S4) one can use the asymptotic properties of the statistic Z^ = n1/2  ( Xbar - μ)/s to form confidence intervals based on the standard normal distribution without making any assumptions about the distribution of the sample data and using s2 to estimate   σ2.

How large does n have to be before asymptotic properties apply?  From Kennedy, A Guide to Econometrics 5th Edition:

 "How large does the sample size have to be for estimators to display their asymptotic properties? The answer to this crucial question depends on the characteristics of the problem at hand. Goldfeld and Quandt (1972, p.277) report an example in which a sample size of 30 is sufficiently large and an example in which a sample of 200 is required."

An important note to remember, it is often the case that people say 'as n becomes large the normal distribution approximates the t-distribution', but in fact, as shown above, as n-becomes large the formulation above (Z^) actually approximates the normal distribution (again based on the CLT and the Slutsky theorems).

References:

A Guide to Econometrics, Kennedy 2003.
A Course in Econometrics, Goldberger 1991.
Economics 603 Research Methods and Procedures in Economics Course Notes. University of Kentucky. Taught by Dr. Christopher Bollinger (2002).