Five Symptoms of Data Prep Deficiency
by Jon Pilkington
The role of “data scientist” was recently dubbed the best job in America by Glassdoor. That distinction was based in part on the number of job openings, salary and career opportunities associated with the position. While data scientists may have inherited bragging rights for having the hottest job around, the role isn’t as glamorous as it may sound. According to the 2016 Data Science Report released by Crowdflower, 76 percent of data scientists view data preparation (prep) as the least enjoyable part of their work – a tough reality to endure considering research reveals that data prep accounts for up to 80 percent1 of their time. But, it doesn’t have to!
Self-service data prep technology can tip the scales back in data scientists’ favor by helping them drastically reduce the time they spend on data prep, and instead, devote those precious hours to performing analysis that will speed decision making and deliver business value.
Below are five symptoms you may be feeling that indicate it may be time to consider a self-service data prep solution.
- You spend more time collecting and preparing data, than you do analyzing it.
Anyone can easily connect to relational data, CSV and other standard, structured data. But the data that provides the most analytical value has traditionally been locked away in multi-structured and unstructured documents, such as text reports, web pages, PDFs, JSON and log files. Data scientists, data analysts and even everyday business users, mistakenly believe that it’s impossible to use this information without manually rekeying the data. Complicating matters, much of this data is not in an analysis-ready format. Given the manual retrieval and prep processes performed by many analysts today, it’s not surprising that 80 percent of their time is being devoted to these routine tasks.
Self-service data prep solutions enable data analysts and business users to prep less and analyze more by easily and rapidly acquiring, manipulating, blending and preparing data from virtually any source. Data can be prepared for analysis in a fraction of the time that it takes using spreadsheets and other manually-intensive measures. And that means more time can be spent on performing analysis that yields actionable business insights.
- You’re making business decisions based on outdated or incomplete data.
Because many data analysts and business users believe information in multi-structured and unstructured repositories is completely inaccessible, or not accessible in an acceptable timeframe, they “write-off” or simply forget about these data sources altogether. As a result, business decisions are being made on incomplete information. In fact, it’s estimated that only 12 percent2 of enterprise data is used today to make decisions. That’s frightening. What’s even scarier? By the time analysts gather data from a variety of sources and file types via manual retrieval and prep processes, the information is outdated.
With self-service data prep, organizations gain access not only the right data, but all of the data, crucial to getting a holistic view of the business. Users are able to make faster, more strategic business decisions based on accurate and up-to-date information.
- Errors from manually rekeying data are becoming a little too frequent.
We’ve talked a lot about the consequences of implementing manual data retrieval and prep processes, but let’s add one more to the list: a high frequency of error. Humans make mistakes. Regardless of the preventative measures put in place, manual rekeying of data will result in a margin of error. And business decisions made on incorrect data can have a negative impact on the bottom line.
Data prep tools eliminate the need for rekeying data, while also enabling users to gather information from a wide variety of sources, consolidate and append diverse data with ease, and provide transparency as well as a clear audit trail.
- You have to rely on the IT department to access the data you need.
Much of the information that data analysts and business users require is housed in repositories governed by IT or within sources that require IT intervention. And because IT professionals are charged with data protection, they make information available on an as-needed basis. Consequently, business users are left with no choice but to rely on IT to get the data they need for analysis.
Many organizations are grappling with how to reconcile the data divide between business users and IT departments. The good news is that self-service data prep solutions bridge the gap between the ease-of-use and agility that business users demand and the automation, scalability and governance required by IT. Data prep solutions address governance risks by securely storing, managing and controlling access to source content, prepared data, reusable extraction and prep models, and created visualizations and dashboards – without impeding self-service analytics processes. Many also offer governance capabilities such as data retention, data masking, data lineage, role-based access and auditing functions to further ease IT’s apprehensions.
- The cost of a data prep solution is less than the combined salaries of the team currently managing the task.
A recent study from Blue Hill Research found that the typical data analyst spends, on average, two hours a day on data prep activities, which equates to roughly $22,000 per year. This figure does not include the cost of actual analysis. If an organization employs multiple data analysts, that cost quickly adds up.
While an upfront investment, data prep tools save money in the long run by greatly reducing data prep activities and empowering ordinary business users to transform data into actionable intelligence. And that’s true power to the people.
Jon Pilkington is the chief product officer at Datawatch.
1 Forrester Blog, 3 Ways Data Preparation Tools Help You Get Ahead Of Big Data, February 2015: http://blogs.forrester.com/michele_goetz/15-02-17-3_ways_data_preparation_tools_help_you_get_ahead_of_big_data
2 The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014: http://info.mapr.com/rs/mapr/images/The_Forrester_Wave_Big_Data_Hadoop_Q12014.pdf