By Gagan Mehra
In the last few years we have seen Big Data generate a lot of buzz along with the launch of several successful big data products. The big data ecosystem has now reached a tipping point where the basic infrastructural capabilities for supporting big data challenges and opportunities are easily available. Now we are entering what I would call the next generation of big data — big data 2.0 — where the focus is on three key areas:
Data is growing at an exponential rate, and the ability to analyze it faster is more important than ever. Almost every big data vendor is coming out with product offerings, like in-memory processing to process data faster. Hadoop also launched its new release, Hadoop 2.0 / YARN, which can process data in near real-time. Another big data technology gaining traction is Apache Spark, which can run 100 times faster than Hadoop. Leading Silicon Valley venture capital firm Andreessen Horowitz led a $14 million investment to start a company named Databricks around Apache Spark.
Even the analytics providers are realizing the importance of speed and have built products that can analyze terabytes of data within seconds. This aligns well with the growing presence of sensors/Internet of things in the consumer and industrial world. Sensors can generate millions of events per second and analyzing them in real-time is not trivial. One of our customers faced this challenge recently when their sensor data ballooned to 5TB a day, and they quickly realized the importance of speed while handling such large data volumes.
Data storage costs have come down over the years, but it still continues to be expensive. Most businesses prefer analyzing streaming data in real-time to filter out the noise versus spending money to store the complete data stream.
2. Data Quality
Data quality has never been sexy but becomes more important with data growing at an exponential rate. The speed at which decisions are made has already reached a point where the human brain can’t keep up. This means that based on defined rules, data is cleansed and processed and decisions are made, all without any human intervention. In such environments, a single stream of bad data can act as a virus and result in incorrect decisions or heavy financial loss. A good example is the world of algorithmic trading, where trades are placed every few milliseconds by analyzing stock market trends using algorithms versus a human.
Data quality has become a key part of service level agreements (SLAs) in evolving digital enterprises. Bad quality data can result in blacklisting the data provider/supplier or severe financial penalties. B2B environments are the early adopters as they rely heavily on the quality of data to ensure smooth business operations. Some enterprises are moving in the direction of deploying real-time alerts for data quality issues. The alerts can be sent to the designated person based on the issue and can also suggest recommendations on how to fix the issue.
Machine learning is another technique that is being used to improve data quality. It has made it easier to conduct pattern analysis to identify new data quality issues. Machine learning systems can be deployed in a closed loop environment where the data quality rules are refined as new quality issues are identified via pattern analysis and other techniques.
Big data has created so much excitement that everyone wants to use it, but the technical challenges prevent greater adoption. Applications help overcome this challenge by making it easy for everyone to benefit from big data. Over the next few years we will see thousands of specialized applications launched for various industry verticals to solve big data challenges (Editor’s note: For example, VentureBeat is hosting its next DataBeat event in May to focus exclusively on big data applications that help companies achieve better financial growth). We have already seen some big data applications become hugely successful like eHarmony, Roambi, Climate Corporation, etc. In the near future, even a small business will be able to benefit from analyzing big data without requiring special infrastructure or hiring data scientists.
These applications will correlate customer data from multiple channels to have a better understanding of customers and hence will make more money for businesses by targeting the right customers with the right products. And, hopefully, some of these applications will make our lives better by having personalized applications for health care, diet / food, entertainment, etc.
You might have seen other trends that indicate that we are entering the next generation of big data, and we would love to hear about them. Please comment if you would like to share.
Article originally posted on Venturebeat
Gagan Mehra is Software AG‘s Chief Evangelist. He has over 15 years of experience creating and implementing digital strategies for leading technology companies on six continents. His areas of expertise include big data, e-commerce, business and IT transformation, and enterprise architecture. Prior to joining Terracotta (which Software AG acquired in 2011), Gagan was a leader within Deloitte Consulting’s Digital practice, where he led web-related projects for clients like Adobe, Agilent, Expedia, HP,McAfee, Seagate and Walmart, among others. Gagan also worked for software services providers Zensar and Network Programs in India for five years before moving to the US with IT consultancy Techspan.