Identifying investment signals via social media analytics


US-headquartered quantitative investment management firm

Business objective

The client was looking to upgrade inhouse quantitative models to incorporate investment signals originating across multiple social media sources

Benefits and outcomes of our engagement

  • Established significant degree of correlation between the past 5 days sentiment average and the next leading five days closing price average
  • Expanded the approach to cover other sectors
  • Our client integrated our approach across their in house quant models to enhance their ability to deliver alpha

SGA approach

  • SGA conducted a detailed background analysis to identify sectors and corresponding social media sources that would generate the most meaningful signals for the client’s quant models
  • Based on the background analysis, SGA identified pharmaceuticals as the most preferred sector to conduct the analysis. The decision was based on relatively lower spam / irrelevant content across social media and ability to leverage pharma-specific content from social media sources such as Stocktwits
  • In addition to StockTwits, SGA also leveraged social media sources including Twitter

Process flow

  • Post data cleaning, SGA generated sentiment scores for each tweet / StockTwit. SGA leveraged a detailed sector-specific lexicon to enhance the accuracy of the model
  • SGA aggregated the scores up to day level and normalized the values
  • Thereafter, we combined the sentiment score table and the stock price data. As part of the backtesting process, SGA determined the correlation between the past 5 day’s sentiment score for a particular pharma stock and the stock price of the company over the next 5 days.
  • We repeated this analysis across multiple time periods and across different pharma stocks
  • We created new variables for attributes such as lag sentiment scores, lead closing prices, normalized all variables, performed Pearson’s correlation between dependent and independent variables, along with regression analysis