How To Predict Sales Using Social Media Data

Untitled design.png

Social data analytics is both informing and transforming many existing practices in politics, marketing, investing, product development, entertainment, and news media. Since social media can be construed as a form of collective wisdom, it presents an interesting opportunity for harnessing data that creates specific predictions for real-world outcomes. The assumption being that social media actions such as tweeting, liking, and commenting are proxies for consumer’s attention to a particular product.

Predicting sales and brand equity... really?

Can we predict sales performances by using social media data?...and by doing so, can we avoid running costly and tedious brand trackers? These were the questions that resonated through marketing departments during the early days of the social media listening era (2009-2010). While initial results were quite promising; such as the use of social media channels to predict Hollywood movie revenues, Apple iPhone sales, seasonal moods, and epidemic outbreaks (See table below, courtesy of Vatrapu, La cour), the peak of inflated expectation eventually passed and disillusionment quickly took over. The widespread belief being that social media data was simply too noisy and too biased, to accurately correlate with sales data.


Yes.. brand equity and sales can be predicted with stunning accuracy

I recently stumbled across this white paper published by Kantar TNS and was surprised to discover that not only did Kantar TNS manage to create a measure of brand health by using Twitter data (“Brand Health Index” - BHI), but they also created a model to predict on-premise and off-premise volume.

The most interesting part is on the result section. Kantar TNS claims that the model can:

  • predict 85% of the variance in brand equity revealed in brand equity surveys.

  • predict up to 12 weeks in advance of when these changes appeared in survey results.

  • run with a Mean Average Percent Error (MAPE) of the sales prediction model between 3-4% for on-premise and off-premise volume in specific categories like beer. 

Not only is this approach simpler, quicker, and more cost-effective, it also provides insights by looking at how brand activity on Twitter directly influences equity and sales.

Prediction results are quite stunning (see below), both in the level of granularity and consistency. 

Brand on-premise volume graph

Not every tweet is a good tweet..

The key point of Kantar’s approach states that counting the volume of tweets is not enough to build a reliable predictive model. Instead, only certain categories of tweets must be taken into account. Themes such as "heritage", "preference", "loyalty", "promotion" or "price" have predictive power over generic tweets that would only degrade the model. 

This intuition is also confirmed in a case study published  by the University of Eindhoven, which states; “...while for books and movies, simply counting the number of Tweets provides enough information to predict sales, this is not true for all products. Instead, for products that generate less Twitter activity, only certain classes of Tweets relate to company sales.”

..but use a rich historical data-set

Another important aspect of the Kantar work is that it heavily relies on time-series analysis to make the prediction work. In particular, two years of twitter data (weekly aggregation) is used to train the model.

and remember..the devil, as usual, is in the details..

I think the most surprising thing is that Kantar TNS managed to build an "adaptable" workflow based on social media data that could be applied to multiple categories and brand.

Such process (see picture below) combines human intervention with machine-based learning, to detect human nuances such as sarcasm, idioms and Twitter slang. This process enabled them to deliver a clean set of data with positive, neutral and negative sentiments and clean data segmented by theme. 

The BHI Modeling journey

The BHI Modeling journey


I think that there's great potential for these kinds of approaches for two reasons: firstly the value proposition of these model is clear: replace / complement brand trackers with a 100% passive, data-driven approach. Secondly, this work provides yet more evidence that the ability of Twitter to predict brand equity is no fluke—it is a natural outcome of data that appears to be more robust and more predictive at every level than traditional surveys.

Find out more

Read the full Kantar white paper by clicking here!