Big Data is clearly a disruptive technology, but using it successfully is as much art as it is science. The key is integrating Big Data with traditional BI to create a data ecosystem that allows you to generate new insights while executing on what you already know.
Large enterprises, especially financial services firms, are adopting Big Data at a more rapid pace than expected, according to a recently released study by Big Data consulting firm NewVantage Partners. But how are organizations using Big Data to make better business decisions? Success, it turns out, is as much an art form as it is a technology solution.
Last summer, NewVantage Partners held discussions with C-level executives (chief data officers, chief information officers, chief technology officers, chief analytics officers, chief information architects) as well as line-of-business heads and senior function heads at more than 50 large companies, most of them with more than 30,000 employees. Executives from financial services firms represented about 50 percent of the respondents, but NewVantage Partners also interviewed executives from insurance, government and other businesses.
“Most of the companies spend over $1 billion a year in technology,” says Paul Barth, founder and managing partner of NewVantage Partners. “They aren’t usually quick to respond, but when they do they have a lot more resources to put behind their initiative.”
Use Big Data to Make Better Decisions
And these large enterprises see value in Big Data and are bringing their resources to bear. Barth says 85 percent of respondents reported they already have Big Data initiatives under way.
“Over 75 percent were investing over $1 million a year already,” Barth says. “Twenty-five percent were investing over $10 million a year. There’s a real commitment to using this technology for one program or a series of programs. They made it absolutely clear they could not do the programs they were doing without Big Data.”
Real Value of Big Data Is Accelerating Time-to-Answer
Respondents gave a number of reasons for their investments in Big Data, from reducing risk to creating higher-quality products and services. But two reasons were clear leaders: achieving better, fact-based decision-making and improving the customer experience. Of course, these are leading reasons for investments in traditional business intelligence (BI) analytics, too. Barth said that when NewVantage dug deeper, the real “quantum leap” for companies when it comes to Big Data is accelerating the speed at which they can get to a decision, or time-to-answer (TTA).
“If your time-to-answer is 30 minutes in one case and 30 seconds in another, it really changes your business processes,” Barth says. “It makes you much more effective as a business analyst.”
By using new Big Data technologies, organizations can answer questions in seconds rather than days and in days rather than months, Barth says. This acceleration, in turn, allows businesses to answer questions that have resisted analysis, develop test and learn processes that quickly adapt to the market and automate complex workflows.
However, reaping the benefits of accelerated TTA requires following a careful process based upon a clearly defined and governed relationship between Big Data and traditional analytics solutions.
Big Data, Analytics and Organizational Alignment
“Historically, there has been much talk about the difference between traditional analytics and Big Data, and organizational responsibility for each within an enterprise,” the report says. “The survey, however, shows the two are becoming closely intertwined and must work together to deliver the promised results of Big Data. Further, breaking down organizational boundaries and creating close integration between IT organizations and the business units is a critical step for any organization hoping to build a winning strategy for Big Data.”
“Data management and analytics have often resided in different parts of the organization,” the report adds. “IT departments usually controlled the data and analytics was conducted in either a special group or within a business unit. This is contrary to the entire principle of Big Data and the survey confirms that organizations understand close integration is necessary. Sixty-five percent say, “Big Data is an integral part of Data Management,” and 68 percent further felt that “Big Data is part of the Advanced Analytics toolbox.”
Making this leap—integrating traditional analytics and Big Data while tearing down boundaries between IT and business units—is a critical early step in creating organizational initiatives that leverage Big Data to affect the business, NewVantage concludes.
“Integrating real-time, full analytic capabilities into the business and operating units will enable the type of quick reactions to key business questions and challenges that can build competitive advantage and improve performance,” the report says.
“Think about your data and data quality as having different stages that we call bronze, silver and gold,” Barth adds. “Data in your data warehouse is gold. When you go to that gold source, you know you’re getting data that has been really worked through. But what if data is also available in a raw form and you can get it to me in a week or a month, if you can dump all the data in one place and organize it just a little bit? The data is useful before it’s perfect.”
Unlike traditional relational databases, Big Data platforms allow analysts to organize, clean and integrate data selectively, ignoring records and fields that are not the current focus of analysis. This is a significant departure from data warehouses, where a great deal of effort is spent on data engineering to make sure it’s production-worthy before it’s released to users. NewVantage notes that by deferring full data engineering, Big Data platforms accelerate TTA during discovery-oriented analysis and eliminate the engineering effort on data that doesn’t deliver value.
The idea then, Barth says, is that Big Data platforms become one piece—albeit an important one—of a data ecosystem that is designed to constantly look for new insights into customers, markets, products and risks, while at the same time building upon what is already known. In other words, pursue the “new” while operating on the “known,” a healthy, continuous improvement model.
Creating a Big Data Ecosystem
Think of it this way: Whether it comes from Big Data or traditional analytics, the important thing is to provide valuable answers. The value of an answer, Barth says, is based on its accuracy and the speed with which it can be delivered. To get an accurate, speedy answer, it’s important to ask the right questions. And that’s where Big Data comes in: It’s about pursuing the “new.”
“The art part of Big Data is associated with discovery and explanation,” Barth says. “You’re looking for something you can’t quite articulate. There’s a phase of analytics that’s exploration and discovery, in which you’re generating hypotheses. Then comes modeling and application. In my view, traditional BI tends to be really far down the road after you understand the underlying analytics and correlations that relate to an issue you’re seeing. The phase in which you don’t understand it, the discovery phase, is where Big Data is useful.”
There are seven distinct steps to answering a complex business question, NewVantage says:
- Clarify the question and the type of answer needed; build the business case
- Identify the data required and analysis approach
- Source the data
- Cleanse, normalize and integrate the data
- Analyze the data
- Validate the results
- Present or apply the answer; measure the results
Traditionally, companies have spent 80 percent or more of their time on the third and fourth steps, NewVantage says. But Big Data solutions offer up new ways of approaching these steps.
First and foremost, because of the relatively low cost and high capacity of Big Data platforms, organizations can load all of the data from their source systems rather than choosing particular data for the question at hand.
“While this may seem wasteful, it eliminates two important delays: writing programs to select just the data needed, and going back to the source systems multiple times as new insights generate new questions that need new data,” NewVantage says. “Building traditional data marts and data warehouses is extraordinarily complex and costly. The broad range of open source offerings coupled with flexible, scalable grid systems create an environment that not only drives down costs, but also offers the potential of decreasing query times exponentially.”
For instance, Barth points to a large financial services firm that wanted to perform multi-channel pathing analysis of its customers to understand which elements led to a sale and which led to attrition. To do so, the firm needed to integrate six months of session data with other channel data. The first attempt, using traditional relational databases, took tens of thousands of lines of SQL code and the firm soon realized that it could only afford to access six days worth of data rather than six months. The firm abandoned the attempt after calculating that the effort would take weeks.
“In a Big Data environment, they were able to write and execute it in under 100 lines of code,” Barth says. “They executed it in less than 24 hours, processing hundreds of terabytes of session data. The analysis was on data they already had; they had it inside the bank. They just wanted to know what their own customers were doing on their own channels. This really unlocked that visibility into the way their bank was running.”
But the key, Barth says, is to take it one more step. Once you understand what you’re seeing, you can develop a model that explains it and metrics to measure your execution in improving your business against that model. That’s where traditional BI comes in.
“The “new” and the “known” are not islands; they must be symbiotic systems connected to and feeding each other,” NewVantage says. “”New” analyses require rapid access to all the “known” data representing the reality of today’s business. Conversely, there must be a disciplined approach to promoting new insights, data and models to evolve the “known.” Without this linkage, the systems diverge into incoherence that does not reconcile or scale.”
“Emerging technologies and methodologies—including Hadoop, Cloudera, database appliances, accelerators and self-learning and genetic algorithms—can dramatically reduce TTA,” NewVantage adds. “The key to streamlining the time from a corporate question to game-changing business insight is to right-size the approach to analysis: rapid iterations during discovery and rigorous engineering into production. Strong governance and oversight of known data capabilities must coexist with agile data analysis that paves the way for new data discoveries. Enterprises with the capability to create an ebb and flow between dynamic discovery and scalable execution position themselves for sustainable success and dominance over their competitors.”
Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline and on Facebook. Email Thor at [email protected]