adv

Big data: Bringing together BI and predictive analytics

For as long as anyone can remember, the world of predictive analytics has been the exclusive realm of ivory-tower statisticians and data scientists who sit far away from the everyday line of business decision maker. Big data is about to change that.

As more data streams come online and are integrated into existing BI, CRM, ERP and other mission-critical business systems, the ever-elusive (and oh so profitable) single view of the customer may finally come into focus. While most customer service and field sales representatives have yet to feel the impact, companies such as IBM and MicroStrategy are working to see that they do soon.

Big data moves analytics beyond pencil-pushers
Imagine a world in which a CSR sitting at her console can make an independent decision on whether a problem customer is worth keeping or upgrading. Imagine, too, that a field salesman can change a retailer's wine rack on the fly based on the preferences that partiers attending the jazz festival next weekend have contributed on Facebook and Twitter.

Big data is pushing a tool more commonly used for cohort and regression analysis into the hands of line-level managers, who can then use non-transactional data to make strategic, long-term business decisions about, for example, what to put on store shelves and when to put it there.

However, big data is not about to supplant traditional BI tools, says Rita Sallam, Gartner's BI analyst. If anything, big data will make BI more valuable and useful to the business. "We're always going to need to look at the past...and when you have big data, you are going to need to do that even more. BI doesn't go away. It gets enhanced by big data."

How else you will know if what you are seeing in the initial phases of discovery will indeed bear out over time. For example, do red purses really sell better than blue ones in the Midwest? An initial pass through the data may suggest so-more red purses sold last quarter than ever before, therefore, red purses sell better.

But this is a correlation, not a cause. If you look more closely, using historical transaction data gleaned from your BI tools, you may find, say, that it is actually your latest merchandise-positioning-campaign that's paying dividends because the retailers are now putting red purses at eye level.

That's why IBM's Director of Emerging Technologies, David Barnes, is actually more inclined to refer to the resulting output from big data technologies such as Hadoop, map/reduce and R as "insights." You wouldn't want to make mission-critical business decisions based on sentiment analysis of a Twitter stream, for example.

Reviewing unstructured data in social media reaps immediate rewards
There is value in social media, though. What if you learn, as the buyer for a retailer, that Justin Bieber fans really loved the jacket he was wearing at the concert last night-and, oh, by the way, someone tweeted he got it from one of your stores? You could then make a snap decision to stock up on that jacket just in that city since you know it's about to become a very hot item, albeit for a very limited time.

Without a predictive analytics (PA) package looking for patterns in the Twittersphere that correlate your brand with geographic location and factors such as the number of mentions, you could miss out on a great but small window of opportunity to move merchandise.

"In the past, we would have based [our decisions] on historical data-and, by the time we did it, that trend may have already passed us," says Barnes. "So that's PA on steroids, at warp speed."

How this is accomplished is a marriage of open source technologies (where most of the Big Data platforms are coming from these days), Moore's Law, commodity hardware, the cloud and the ability to capture and store huge volumes of non-transactional data that was once discarded because no one knew what to do with it.

Unstructured data such as video and email, often cited as a driving force behind big data, barely plays a part in this. Scour blog posts and user forums, though, then correlate that information with geographic data, couple it with flat files of your existing structured customer data and bring in streams from new sources such as the MicroStrategy Wisdom engine, which tracks what some 14 million Facebook users are saying about your brand, and now you've got a new and powerful tool.

R.K. Paleru, director of industry marketing for BI vendor MicroStrategy, says two things have happened with big data. "You're able to bring in more variety of data from different sources, but [you] can also take all that data and...micro-optimize. [For example,] how can you transform behavior using tools like the iPad or smartphones at the point where this tactical business decision has to be made?"

Shortening "Time to Answer" key to big data analytics
One big advantage to this type of analytics is the shortening of the "time to answer" (TTA), according to Paul Barth, founder and managing partner of New Vantage Partners, a boutique information management and analytics consulting firm. The queries, or models, that used to take data scientists months to build in order to answer forward looking business questions about supply chain or production schedules can now be done, in some instances, in hours, and in bulk.

This happens because big data technologies allow information to be worked with before it is optimized or rationalized or relational-ized. This, coupled with advanced analytics, lets line of business managers ask and answer questions in very short cycles. (It's not plug-and-play quite yet, though, so IT workers and data modelers will have to lend a hand.)

"These folks are using Big Data to automate machine-learning, turn-the-crank processes," Barth says. Doing so can generate upwards of 20,000 data models for each product line, in each market around the world, letting users look up to 18 months forward. "That's a big change. The reason they can do that is because Big Data technology can automate a lot of the modeling steps and execute it in a lights-out fashion."

Not long ago, this would have been nearly impossible. It took statistical analysts weeks or even months to build a single model. If you sold 100 products, you really couldn't move beyond 1,000 models for your entire product line, which means the information these models returned wasn't nearly as accurate, or as timely, as the big data models available today.

"Big Data is as much as about big analytics as it is about big data," Barth says. "This is what data scientist's love. They can iterate and iterate and iterate while they are learning the data and getting some initial insights during discovery."

Gigabits of I/O and the ability to work with data in business analytics sandboxes outside of production environments, power these thought-exercises in what Barth calls a kind of "Agile analytics" approach to asking questions and solving problems.

Big data analytics not ready for prime time
While all of this is promising and exciting for business users-if they even know about it, which they don't- hooking big data analytics into a natural language processing engine and a Siri-like Q&A interface is some ways off. Hadoop, while powerful, is still by all accounts a "primitive" tool for tackling massive data sets.

Think very carefully about the usefulness of these insights, too. Are 100 million opinions really worth more than 100,000, Barth asks-or even a highly qualified and influential 1,000?

"There's a lot of repetition out there," Barth says, and "you still need really smart analysts" if you want your analyses done right. Fortunately, he adds, big data gives them "very powerful tools" to do so.

CIO.com



Comments