Social Bot Detection on Twitch

Twitch is the leading world live streaming video platform for the Gamer’s community. It is a very famous networking site and has close to 100 million monthly unique users. Bots are very prominent on the network due to various financial favors that the gaming platform provides to a user. The main objective of our Project is Detecting Social Bots on Twitch using various techniques such as Meta-data Analysis, Sentiment analysis from Chats on a Channel, and classification using Machine learning.

We started by collecting usernames of 510 channels for which we compared chatters and viewers on that channels live video. We got 51 channels which had chatters>viewers. On those channels, we did Temporal analysis for over a period of 4 weeks. Alongside, we collected their metadata, such as, Follower, Followings, Status, Partner, and total views. We calculated a Score using these features, from which we could conclude that higher the score, higher the probability of an account being a Bot account.

Twitch has its official IRC client from which we collected chats on a channel using another tool, Chatty. These chats were collected for channels with higher score and sentiment analysis was done on the extracted messages. From the IRC client, we could also get users which were getting banned from chats by Twitch itself. We considered such users as Ground Truth for our next technique.

Using the Score from metadata features and the Ground truth from Chat analysis, we gave labels to our data set, that we used as our machine leaning training data set and a randomly collected test set to get our accuracies for our classification.

Summary of Methodology and Analysis

Data Collection and Filtering: We used Twitchs' official API, Kraken to collect 510 random users and their chatters and viewers count. From these users, 51 accounts showed the Bot behaviour i.e. chatters>viewers.
We limited are analysis to the 51 users from which we extracted the list of chatters. For those users(chatters), we collected various endpoints, such as followers, followings, status , partner value and views of a user which are further used to perform temporal analysis, meta-data analysis and apply machine learning classifiers.
For sentiment analysis, we extracted the chats of the most suspicious accounts using a tool known as Chatty. This extracted the live chats from the IRC client of twitch.

Temporal Analysis: Our project was associated with live streaming data analysis, so temporal analysis was an integral part for our data collection. We performed Temporal analysis on the metrics viewers and chatters count for over a period of 4 weeks to finalize our data set for for further analysis. We had initially found 51 of 510 users which showed Bot behavior, i.e. chatter count > viewers count. This analysis helped us in keeping track of accounts that showed bot activity regularly.

Meta-data Analysis: Using the metrics collected, such as follower, followings, status, partner and views, we created a Botscore formula which we used to identify the probability of an account being a bot. The botscore was formulated on the basis of the prominence of the various metrics being used. This score was given out of 5- where the higher the score, more chances the account is suspicious. We then took users with higher botscore to do sentiment analysis on their chats. Apart from this, this data was also used to train and classify accounts using Machine Learning models.

Sentiment Analysis: For sentiment analysis ,we used Textblob, a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

We collected chats of possible fake bots from IRC chat channel.Then using TextBlob we got polarity and subjectivity of each message.For neutral message and containing possible spam words we increased the count of spams detected of each user id.Thus we got data on how frequently a user id sent a spam message. We further analyzed the banned users' profiles and found that 70% of the accounts were already suspended by Twitch.
Through this method we were able to get 75% accurate detection.

Classification using Machine Learning: The manually annotated data and our ground truth that we collected was used to train Machine Learning models. We trained the following five models.

1. Gaussian Naive Bayes

2. Logistic Regression

3. Support Vector Machines

4. Decision Tree Classifier

5. Neural Networks

Following are the obtained accuracies for the different models:

Next, to check the validity of the Botscore we created, we made a small data set similar to the one we used for training. The only difference between the training data set and this data set was that in this data set, ground truth was replaced by our calculated Botscore.

We re-ran our trained models on this new testing data set and found out that accuracies were comparable to the original accuracies we got. This conformed that our Botscore formula was in accordance with the trained models.

Conclusion: Twitch is growing platform and such fake accounts only hamper the profit for the company and actual deserving video streamers.Therefore, the need to identify these bot accounts is increasing proportionally with the number of bot accounts getting created. The above analysis was successful in identifying 70% of the actual fake accounts banned or suspended by Twitch itself. For future purposes, we plan on increasing the data set, use more features for Machine learning and content analysis for the extracted chats.

Link to our video - https://youtu.be/AXLK9H_Uuls

The Team

L-R: Akhil Goel, Shreyash Arya, Tushita Rathore, Sarthika Dhawan, Mayank Bhoria

Some more photos from the presentation:

CSE648: Privacy and Security in Online Social Media Projects

Search This Blog

Social Bot Detection on Twitch

Comments

Post a Comment

Popular posts from this blog

White or Blue, the Whale gets its Vengeance: A Social Media Analysis of the Blue Whale Challenge

Privacy Control

Identifying Tinder Profiles on Facebook