The data-mining industry has long been driven by hype as much as the potential for return on investment. It pays to remember this whenever you come across a reference to the retail chain that mined its transactional data for profitable nuggets of actionable information and reportedly hit the mother lode by finding a golden relationship between beer and diapers.
According to this infamous industry legend, an early retail adopter of data mining found that fathers making emergency late-night runs to pick up baby bum wraps also frequently felt the need to purchase a six pack or two. This discovery led the merchant in question (which varies in the telling of this tale) to try to stimulate impulse-buying by placing these items side by side in the store. Sales reportedly soared.
Unfortunately, as recently pointed out in Ivey’s Impact, the diaper-beer story, which helped convince many organizations to jump on the data warehousing bandwagon in the 1990s, was misleading to say the very least. As Forbes magazine noted in 1998 (in a story featuring quotes from me as an investigative reporter and editor with Computing Canada), more than a few companies “lost their shirts because of beer and diapers.”
Indeed, many early multi-million-dollar data warehousing projects were abandoned due to problems ranging from the significant amounts of dirty data contained in IT systems to the length of time and expense required to move all transactional data into one system that could be mined. As Forbes reported, data warehouses “are expensive and typically take two years to build. That’s enough time for project champions to move or be dethroned, or for the business environment to change. The warehouses also cut across turfs and functional areas. The fighting can be vicious.”
Back then, the estimated failure rate for early adopters of data-mining projects was 70 per cent. Since the dot-bomb era, of course, the potential payoff of data mining has soared, along with the growth of digital data. And today, thanks to the massive amounts of user-generated content (UGC) online, you don’t need to build a data warehouse to get in on the data-mining action.
As a result, mining social media sites for actionable intelligence should be on the radar of consumer product businesses, says Xin (Shane) Wang, an Assistant Professor of Marketing at the Ivey Business School. “Social media can tell companies how to improve their product offerings,” he told Impact. “All businesses need to do is figure out how to listen effectively. And that is what my research aims to help them do.”
As an academic researcher, Wang — who worked as a statistician prior to pursuing a PhD in marketing — is focused on helping companies successfully mine UGC, which is challenging due to its unstructured nature and the large volumes available online. His research has helped develop an innovative method of mining online product reviews that has been deployed to analyze tablet computer reviews from Amazon.com. And as Impact noted, the new mining method provides high validity compared with extant market structure analysis methods. Built through data collection, cleaning, filtering and integration processes, this data set — which contains information on product characteristics and market dynamics gleaned from more than 40,000 product reviews spanning a period of 24 weeks — is available online to benefit the practice of marketing.
While the data set developed relates to tablets, the new data collection method itself can be used to obtain information about different product categories. “Unlike transaction records collected from legacy systems, consumer-generated product reviews contain rich insights into behavioural information and product attributes that matter most to consumers,” Wang said. “In other words, although researchers must take steps to avoid bogus reviews, firms can now actually learn what customers consider the strengths and weaknesses of their products and competitor products — right from the horse’s mouth — without conducting costly surveys or focus groups.”