For a particular client I developed a predictive model that scored a set of patrons or donors at different points of time, providing the predicted probability that they would stop making contributions. At each point in time, they were more and more experienced with the service and more data about the patron was collected. As a result the model’s predictive accuracy improved with time. The client wanted to know, looking at the same cohort of customers over time, how often were the same customers predicted to stop making donations. In other words, at t=1, when the model is weakest, how many customers predicted to stop contributions were also on the ‘list’ at say t=3 when the model is much more accurate?
To do this I used the 'limma' package from the 'bioconductor' R mirror. (see reference below and R code that follows)
Before attempting to construct the Venn diagram, I had to take the scored donor data set and subset it based on all those patrons ever indicated to be 'high risk.' Then I created a data set with one row per patron and a binary indicator tracking their movement from 'novice' to 'intermediate' to 'experienced.' (t=1,2,3 respectively)
Before attempting to construct the Venn diagram, I had to take the scored donor data set and subset it based on all those patrons ever indicated to be 'high risk.' Then I created a data set with one row per patron and a binary indicator tracking their movement from 'novice' to 'intermediate' to 'experienced.' (t=1,2,3 respectively)
The format for the data set is similar to the layout below:
ID NOVICE INTERMEDIATE EXPERIENCED
1 1 1 1
2 1 1 0
3 1 0 0
. . . .
etc.
The resulting Venn Diagram is below:
Note, there were 50 patrons in this example data set, and only 12 of those were predicted to be 'high risk' every time as they moved across each experience category or time period. 15 were high risk 'novice' patrons but never became part of the 'intermediate' or 'experienced' segments.
References:
How can I generate a Venn diagram in R?
http://www.ats.ucla.edu/stat/r/faq/venn.htm
R code:
# *------------------------------------------------------------------ # | PROGRAM NAME: R_Venn # | DATE: 3/15/11 # | CREATED BY: MATT BOGARD # | PROJECT FILE: stats blog # *---------------------------------------------------------------- # | PURPOSE: CREATE VENN DIAGRAMS FOR MEMBERSHIP IN MULTIPLE GROUPS # | # *------------------------------------------------------------------ # | COMMENTS: # | # | 1: REFERENCES: How can I generate a Venn diagram in R? # | http://www.ats.ucla.edu/stat/r/faq/venn.htm # | # | 2: # | 3: # |*------------------------------------------------------------------ # | DATA USED: data scored by predictive model # | # |*------------------------------------------------------------------ # | CONTENTS: # | # | PART 1: Run UCLA example code for practice # | PART 2: My data # | PART 3: # *----------------------------------------------------------------- # | UPDATES: # | # | # *------------------------------------------------------------------ rm(list=ls()) # get rid of any existing data ls() # view open data sets # for 1st time use- get source code for bioconductor limma library source("http://www.bioconductor.org/biocLite.R") biocLite("limma") ls() # see what data is there library(limma) # load package # *------------------------------------------------------------------ # | Part 1: Run UCLA example code # *----------------------------------------------------------------- # read data hsb2<-read.table("http://www.ats.ucla.edu/stat/R/notes/hsb2.csv", sep=',', header=T) fix(hsb2) # view data set # create column vectors to represent the data sets hw<-(hsb2$write>=60) hm<-(hsb2$math >=60) hr<-(hsb2$read >=60) c3<-cbind(hw, hm, hr) # create the matrix that will be used to plot the venn diagram a <- vennCounts(c3) a vennDiagram(a) # plot venn diagram # *------------------------------------------------------------------ # | Part 2: My Data # *----------------------------------------------------------------- setwd('/Users/wkuuser/Desktop/R Data Sets') # set working directory list<- read.csv("CUSTOMER_LOYALTY.csv", na.strings=c(".", "NA", "", "?"), encoding="UTF-8") # read data fix(list) # view data set names(list) # get variable names (for cutting and pasting below) # look at summary statistics for each data group library(Hmisc) # for describe function novice <-list[(list$NOVICE==1),] #subset novice segment describe(novice) # n =37 intermediate <- list[(list$INTERMEDIATE==1),] # subset intermediate segment describe(intermediate) # n=25 experienced <- list[(list$EXPERIENCED==1),] # subset experienced segment describe(experienced) # n= 25 # format data for use in venn diagram function below l <- list[c("NOVICE","INTERMEDIATE","EXPERIENCED" )] # keep only indicator variables l3 <- as.matrix(l) # convert to a matrix # save as matrix a <- vennCounts(l3) # create counts for venn digram a # plot venn digram vennDiagram(a, include = "both", names = c("Novice (n =37)", "Intermediate (n=25)", "Experienced (n=25)"), cex = 1, counts.col = "blue") title("Donors Likely to Stop Contributions by Experience")
No comments:
Post a Comment