Saturday, May 31, 2014

Big Data: Causality and Local Expertise Are Key in Agronomic Applications

In a previous post Big Data: Don't throw the baby out with the bathwater, I made the case that in many instances, we aren't concerned with issues related to causality.

"If a 'big data' ap tells me that someone is spending 14 hours each week on the treadmill, that might be a useful predictor for their health status. If all I care about is identifying people based on health status I think hrs of physical activity would provide useful info.  I might care less if the relationship is causal as long as it is stable....correlations or 'flags' from big data might not 'identify' causal effects, but they are useful for prediction and might point us in directions where we can more rigorously investigate causal relationships"

But sometimes we are interested in causal effects. If that is the case, the article that I reference in the previous post makes a salient point:

"But a theory-free analysis of mere correlations is inevitably fragile. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down."

“Big data” has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever."

I think that may be the instance in many agronomic applications of big data. I've written previously about the convergence of big data, genomics, and agriculture.  In those cases, when I think about applications like ACRES or Field Scripts, I have algorithmic approaches (finding patterns and correlations) in mind, not necessarily causation.

But Dan Frieberg points out some very important things to think about when it comes to using agronomic data in an corn and soybean digest article "Data Decisions: Meaningful data analysis involves agronomic common sense, local expertise." 

He gives an example where data indicates better yields are associated with faster planting speeds, but something else is really going on:

"Sometimes, a data layer is actually a “surrogate” for another layer that you may not have captured. Planting speed was a surrogate for the condition of the planting bed.  High soil pH as a surrogate for cyst nematode. Correlation to slope could be a surrogate for an eroded area within a soil type or the best part of the field because excess water escaped in a wet year."

He concludes:

"big data analytics is not the crystal ball that removes local context. Rather, the power of big data analytics is handing the crystal ball to advisors that have local context"

This is definitely a case where we might want to more rigorously look at relationships identified by data mining algorithms that may not capture this kind of local context.  It may or may not apply to the seed selection algorithms coming to market these days, but as we think about all the data that can potentially be captured through the internet of things from seed choice, planting speed, depth, temperature, moisture, etc this could become especially important. This might call for a much more personal service including data savvy reps to help agronomists and growers get the most from these big data apps or the data that new devices and software tools can collect and aggregate.  Data savvy agronomists will need to know the assumptions and nature of any predictions or analysis, or data captured by these devices and apps to know if surrogate factors like Dan mentions have been appropriately considered. And agronomists, data savvy or not will be key in identifying these kinds of issues.  Is there an ap for that? I don't think there is an automated replacement for this kind of expertise, but as economistTyler Cowen says, the ability to interface well with technology and use it to augment human expertise and judgement is the key to success in the new digital age of big data and automation.


Big Data…Big Deal? Maybe, if Used with Caution.

See also: Analytics vs. Causal Inference

No comments:

Post a Comment