Skip to main content

Social Bot Detection on Twitch

Twitch is the leading world live streaming video platform for the Gamer’s community. It is a very famous networking site and has close to 100 million monthly unique users. Bots are very prominent on the network due to various financial favors that the gaming platform provides to a user. The main objective of our Project is Detecting Social Bots on Twitch using various techniques such as Meta-data Analysis, Sentiment analysis from Chats on a Channel, and classification using Machine learning.

We started by collecting usernames of 510 channels for which we compared chatters and viewers on that channels live video. We got 51 channels which had chatters>viewers. On those channels, we did Temporal analysis for over a period of 4 weeks. Alongside, we collected their metadata, such as, Follower, Followings, Status, Partner, and total views. We calculated a Score using these features, from which we could conclude that higher the score, higher the probability of an account being a Bot account.
Twitch has its official IRC client from which we collected chats on a channel using another tool, Chatty. These chats were collected for channels with higher score and sentiment analysis was done on the extracted messages. From the IRC client, we could also get users which were getting banned from chats by Twitch itself. We considered such users as Ground Truth for our next technique. 
Using the Score from metadata features and the Ground truth from Chat analysis, we gave labels to our data set, that we used as our machine leaning training data set and a randomly collected test set to get our accuracies for our classification. 

                              Summary of Methodology and Analysis

Data Collection and Filtering:  We used Twitchs' official API, Kraken to collect 510 random users and their chatters and viewers count. From these users, 51 accounts showed the Bot behaviour i.e. chatters>viewers. 
We limited are analysis to the 51 users from which we extracted the list of chatters. For those users(chatters), we collected various endpoints, such as followers, followings, status , partner value and views of a user which are further used to perform temporal analysis, meta-data analysis and apply machine learning classifiers.
For sentiment analysis, we extracted the chats of the most suspicious accounts using a tool known as Chatty. This extracted the live chats from the IRC client of twitch.
Temporal Analysis:  Our project was associated with live streaming data analysis, so temporal analysis was an integral part for our data collection. We performed Temporal analysis on the metrics viewers and chatters count for over a period of 4 weeks to finalize our data set for for further analysis. We had initially found 51 of 510 users which showed Bot behavior, i.e. chatter count > viewers count. This analysis helped us in keeping track of accounts that showed bot activity regularly.  

Meta-data Analysis: Using the metrics collected, such as follower, followings, status, partner and views, we created a Botscore formula which we used to identify the probability of an account being a bot. The botscore was formulated on the basis of the prominence of the various metrics being used. This score was given out of 5- where the higher the score, more chances the account is suspicious. We then took users with higher botscore to do sentiment analysis on their chats. Apart from this, this data was also used to train and classify accounts using Machine Learning models.

Sentiment Analysis: For sentiment analysis ,we used Textblob, a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
We collected chats of possible fake bots from IRC chat channel.Then using TextBlob we got polarity and subjectivity of each message.For neutral message and containing possible spam words we increased the count of spams detected of each user id.Thus we got data on how frequently a user id sent a spam message. We further analyzed the banned users' profiles and found that 70% of the accounts were already suspended by Twitch. 
Through this method we were able to get 75% accurate detection.

Classification using Machine Learning: The manually annotated data and our ground truth that we collected was used to train Machine Learning models. We trained the following five models.
1. Gaussian Naive Bayes
2. Logistic Regression
3. Support Vector Machines
4. Decision Tree Classifier
5. Neural Networks
Following are the obtained accuracies for the different models:

Next, to check the validity of the Botscore we created, we made a small data set similar to the one we used for training. The only difference between the training data set and this data set was that in this data set, ground truth was replaced by our calculated Botscore.

We re-ran our trained models on this new testing data set and found out that accuracies were comparable to the original accuracies we got. This conformed that our Botscore formula was in accordance with the trained models.

Conclusion: Twitch is growing platform and such fake accounts only hamper the profit for the company and actual deserving video streamers.Therefore, the need to identify these bot accounts is increasing proportionally with the number of bot accounts getting created. The above analysis was successful in identifying 70% of the actual fake accounts banned or suspended by Twitch itself. For future purposes, we plan on increasing the data set, use more features for Machine learning and content analysis for the extracted chats. 

Link to our video -             

The Team

    L-R:  Akhil Goel, Shreyash Arya, Tushita Rathore, Sarthika Dhawan, Mayank Bhoria 

Some more photos from the presentation:


Popular posts from this blog

Identifying Tinder Profiles on Facebook

Identifying Tinder Profiles on Facebook In the online world, everything that you ever put is linked and connected. You might think that you’ve put some information on one platform and that’s it, you’re good to go. But you, my friend, are sadly mistaken. With this thought in mind and the privacy concerns linked with Online Social Media, we would like to introduce you to our problem statement: Identifying Facebook Profiles from Tinder Profiles. Given a tinder profile, our aim is to identify the corresponding Facebook profile of that person. We are addressing the linkability issue here and trying to highlight how more information than what you’ve mentioned on Tinder can be picked up from your Facebook profile. For those who don’t know, Tinder is a Dating Platform available for a Mobile Application and a Web App. It shows the geographically close profiles around you and you have an option to right swipe(Like) or left swipe(Dislike) them. When two people right swipe each other then it’

iFROOSN: Incentivised Fake Reviews On OSNs with Yelp as the reference

Yelp is an OSN primarily used to popularise the businesses and give reviews about those business. Yelp can be used as an efficient business expander for many upcoming restaurants/spas/saloons who always look for new customers. Problem Statement Our main objective of this course project was to target fake/incentivised reviews on yelp and give a credibility score using which a new user of Yelp can get an overall estimate about the restaurant he/she will visit .We developed an application which required an business ID of yelp as an input and it gave the credibility score as the output along with some inferred results in form of graphs Dataset The primary requirement before starting the project was collecting dataset for Yelp business and corresponding reviews and details about the user which post these reviews .The dataset was obtained through Yelp dataset challenge which was available for academic usage and result collections .The database had predefined schema and

Privacy Control

Online social networks have become an important part of our social lives, and their inherent privacy problems have become a major concern for users. As of March 2016, 142 million Indians maintain a social network profile on Facebook and 30 million on Twitter, which provides them with a convenient way to communicate with family, friends and even total strangers. The Services provided by social media though add convenience to our life to a great extent and have made the world a much closely connected, this boon comes with few hidden problems. Though social media lets users share a part of our life to the world, it also gives birth to the security threats to our personal information.  The users are confronted with a dichotomy between sharing information with their loved ones and friends and sharing information with everyone else on the internet. To help users tackle this dilemma, social networks provide a plethora of privacy settings which allow the user to control his/her pri