Donaldson Company, a filter manufacturer (gasoline, air, chemical, etc), approached ASU to see if students could gather social media data to attempt to predict the future.

They saw that fossil-fuel burning internal combustion engines were likely being phased out, and wanted to see if the social media posts by major engine manufacturers could be used to try to predict the future technologies they were pursuing.

They were looking for a few groups of students to scrape data from one of at least three platforms: YouTube, Facebook, and Twitter. Our group chose Twitter.

Donaldson Company was looking for us to find two things:

  • Mentions of specific alternative power-trains, such as natural gas, hydrogen combustion, hydrogen fuel cell, or electric battery.
  • What they called the "sentiment analysis" of the Tweet: a scoring of the 'tone' of the Tweet broken down into positive, neutral, negative, and "compound" scores. But all they provided was that description, and a general suggestion that a tool already existed to do this task.

It turns out that "sentiment analysis" is a much broader concept, and can be applied to something like classifying a sentence as objective or subjective, or almost any other form of categorization, not just positive or negative tone. However, one tool called VADER (Valence Aware Dictionary and sEntiment Reasoner) was created by Drs. Hutto, C.J. & Gilbert, E.E. in 2014, specifically for social media content.

A Twitter Scraping Tool for Donaldson Company, by three ASU students.

What I integrated or developed:

  • Twitter access via Tweepy
  • A MySQL database (via a student account on Azure cloud services)
  • Singleton-based MySQL connection management
  • Optimizations at the Python level to speed up database updates and insertions
  • Integration of tone-based analysis of Tweet contents via VADER analysis, by Hutto, C.J. & Gilbert, E.E.

The project resulted in attaining and processing over 90,000 Tweets from 118 companies.