Friday, February 22, 2013

Why does IFELSE logic work differently on what appear to be the same values?

 Embarrassingly I'm stumped on this...

I have a program in R for looking at grade distributions in my class. I found something weird recently with my 'ifelse' processing. I noticed that my program seemed to be over counting Cs and under counting Bs.

I'm not sure what's going on. It happened in the case where I was adding extra credit. It has something to do with the addition operation and variable assignment I guess. When I process untransformed data I get the correct # of Bs and Cs. But my goal in a program like this was flexibility - enter the data in excel and process. I'd prefer to add extra credit/curves/corrections etc. in R through processing vs. doing it all in excel.

 A very short modified  version (w/out reading in the data from csv) follows. Am I actually making a simple error in my math, or does R see these values differently than they may appear? I'm admittedly not a normal user of the 'ifelse' logic in R.  But even if it is wrong it should give me the same wrong answer when applied against what appear to be the same values! If I run the summary function  against all the vars I get matching results for grades$grade2 and grades$ec (the modified grade)
grades2 <- c(0.72,0.56,0.84,0.84,1.04,0.48,0.96,
0.8,0.68,0.92,0.72,0.6,0.92,0.72,0.88,0.88,0.76,
0.96,0.76,0.52,1,0.88,0.88,0.88,0.64)
 
grades1 <-c(0.64,0.48,0.76,0.76,0.96,0.4,0.88,
0.72,0.6,0.84,0.64,0.52,0.84,0.64,0.8,0.8,0.68,
0.88,0.68,0.44,0.92,0.8,0.8,0.8,0.56)
 
grades <- data.frame(cbind(grades1,grades2))
 
# format grades
 
grades$letter1 <- ifelse(grades$grades1 >= .90,"A", ifelse(grades$grades1 >= .80 & grades$grades1 < .90, "B",ifelse (grades$grades1 >= .70 & grades$grades1 < .80,"C",ifelse(grades$grades1 >= .60 & grades$grades1 < .70, "D","F"))))
 
# letter grade distribution
table(grades$letter1) 
 
grades$letter2 <- ifelse(grades$grades2 >= .90,"A", ifelse(grades$grades2 >= .80 & grades$grades2 < .90, "B",ifelse (grades$grades2 >= .70 & grades$grades2 < .80,"C",ifelse(grades$grades2 >= .60 & grades$grades2 < .70, "D","F"))))
 
# letter grade distribution
table(grades$letter2) 
 
# bonus: grade1 to = grade2
 
grades$ec <- grades1 + .08
 
# why is it that this misclassifies an .80 as a C in this case but not in the
# case of all of the previous grade1 and grade2 instances??
 
grades$letter.ec <- ifelse(grades$ec >= .90,"A", ifelse(grades$ec >= .80 & grades$ec < .90, "B",ifelse (grades$ec >= .70 & grades$ec < .80,"C",ifelse(grades$ec >= .60 & grades$ec < .70, "D","F"))))
 
# letter grade distribution
table(grades$letter.ec) 
 
summary(grades)
 
 
# if you print grades you can see that it missclassifies an .80 as a C
# for obs 8 for the calculated ec grade, but correctly classifies
# an .80 as  a B for cases of the grade1 and grade2 variables
Created by Pretty R at inside-R.org

Saturday, February 16, 2013

Applied Analytics and Data Science Meet Precision Ag

Machine learning+ statistics+agronomics+economics = new line of business in applied data science. Monsanto is featured here, but the market is wide open. The most valuable data is being generated real time with every trip across the field. It's up to you to harness it and get the most value out of it. Privacy issues aside, it's not worth anything keeping it all to yourself. Don't just give it away, but don' t stuff your data under a mattress either. Welcome to the world of big data in big Ag.

"After they sign up, customers start by selecting their fields from Google Earth maps. Back-end programming then pulls up a wealth of information – everything from soil type to yield potential. As farmers enter in additional information about their farm, such as crop rotation, traits used, etc., the ACRES algorithm spits out recommendations, which users can accept or tweak as needed."

From: Farm Journal: Unlock Your Farm Data

http://www.agweb.com/article/unlock_your_farm_data/

Sounds like new opportunities for data scientists in the agriculture field.

Friday, February 8, 2013

That Modeling Feeling - a different take

I found the following post entitled SAS loves stats and really like this quote:

What advice would you give to students studying statistics today?

"Think about your favorite things in life – and what interests you the most.  You will likely find a role for statistics within those choices.  Once you start playing with the data related to your interests, you’ll learn quickly.  Data is everywhere.  Just be sure to look beyond the obvious."
 
And of course, a corollary to this is that if you love stats, then you might be able to find something interesting about almost anything.