Wednesday, November 4, 2015

'Big' Data vs. 'Clean' Data

I've previously written about the importance of data cleaning, and recently I was reading a post- Data Science Can Transform Agriculture, If We Get It Right on FarmLink's blog and I was impressed by the following:

"We believe it's a transformational time for this industry – call it Ag 3.0 – when the combination of human know-how and insight, coupled with robust data science and analytics will change the productivity, profitability and sustainability of agriculture."

This reminds me, as I have discussed before in relation to big data in agriculture, of Economist Tyler Cohen's comments on an EconTalk podcast, " the ability to interface well with technology and use it to augment human expertise and judgement is the key to success in the new digital age of big data and automation."

 But in relation to data cleaning, I thought this was really impressive:

"...we disqualified more than two-thirds of the data collected during our first year. Now, we inspect each combine before use following a 50 point check list to identify any problem that could affect accuracy of collection, have developed a world class Quality Assurance process to test the data, and created IP addressable access to our combines to be able to identify and compensate for operator error. As a result, last year over 95% of collected data met our standard for being actionable. Admittedly, our first year data was “big.” But we chose to view it as largely worthless to our customers, just as much of the data being collected through farmer exchanges, open APIs, or memory sticks for example will be. It simply lacks the rigor to justify use in such important undertakings."

It takes patience and discipline to sometimes to make the necessary sacrifices and put the necessary resources into data quality, and it looks like this company gets it. Data cleaning isn't just academic. It's serious. Maybe it's time to replace #BigData with #CleanData.
 
Related:
Big Data
Data Cleaning
Got Data? Probably not like your econometrics textbook!
Big Ag Meets Big Data (Part 1 & Part 2)
Big Data- Causality and Local Expertise are Key in Agronomic Applications
Big Ag and Big Data-Marc Bellemare
Big Data, IoT, Ag Finance, and Causal Inference
In God we trust, all others show me your code.
Data Science, 10% inspiration, 90% perspiration

No comments:

Post a Comment