Skip to main content

BuyOut : Look for an account Buying it's Success


Introduction


In recent years, social media has become increasingly popular as a business and communication tool. One of the newest social media tools is Instagram, created by Kevin Systrom and Mike Krieger. Instagram is an application and a service that allows users to capture and share images and videos with followers, either publicly, or privately to pre-approved followers. Officially launched in October 2010, Instagram gained 1 million users within its first two months, and the app had 700 million active monthly users by April 2017. Instagram has further grown into one of the most popular online social networks, with almost 90% of its user base under the age of 35, which is much lower than other OSNs. This makes Instagram unique and provides a higher level of engagement in the user base as previous research has shown the higher technology adoption amongst younger people and their use of social media.



Businesses are recognizing the importance of social media as a way to engage with consumers on a more personal level while being able to implement marketing techniques and further the brand image. Many well-known brands are using social media to reach and engage their consumers by way of ‘influencers’ ie. people who have the ability to increase the brand’s visibility on the network. In the context of Instagram, a user’s influence is often evaluated on the number of people that are following them. Unfortunately the reality is that this method may produce erroneous results, as our project goes to show. This is due to the practice of buying ‘fake’ followers on Instagram.

Goal




The goal of our project is to detect if a particular account on Instagram is buying fake followers. We computed our own metrics for this purpose and used quantitative analysis after corroboration with our collected data to provide an answer for the same.

Our aim is to provide users with a tool to verify the credibility of any other user on the network, and also to showcase the problem of fake followers to Instagram along with a tool to identify them.

Methodology




Despite it being a widespread practice, there is no formal literature or research papers on the issue of fake followers on Instagram. Accordingly, we trawled through masses of random user accounts in order to find ones which could be characterized to have bought followers.

We found there to be many accounts which had a large number of followers, however the engagement level ie. likes and comments on their posts were very low. This was extremely odd as the huge user base was not actively participating in the user’s activity, yet had opted to follow the user. We found this to be the defining characteristic of users who had bought followers.

Another odd behaviour was a large number of likes on posts of these users but very few comments, which turned out to be a characteristic of like buyers. The low number of comments indicated how likes had been bought, just like follows, but engagement was still low due to no comments.

Accordingly we defined three metrics as follows -
  • Like rate - Total number of likes / Total numbers of followers
  • Comment rate - Total number of comments / Total number of followers
  • Engagement rate - ((Total number of likes + Total number of comments) / Total number of followers) x 100

We began by collecting data of 10,000 random users of Instagram. For this purpose we located a database of the user IDs, which is a unique numerical identifier of every user on Instagram. We converted these IDs into user names using web scraping using Python. We employed the open-source instabot for this purpose, creating a headless browser to login to a user account and thereafter create URLs of the form - ‘instagram.com/web/friendships/user_id/follow/’. These are automatically redirected by the Instagram server to corresponding URLs of the form ‘instagram.com/user_name’. Thereafter, we scraped the user profiles using the BeautifulSoup library to access the ‘script’ tags in HTML to extract a JSON file containing the user’s last 10 posts, along with the likes and comments on them. We also scraped the user’s bio and total follower and following counts. We used this data to compute our metrics.

We found that our computed metrics varied greatly depending on the number of followers a user had. However, distinct grouping could be found as follows -

No. of followers
Like rate
Comment rate
Engagement
0-100
14.84
.98
15.83
100-1k
10.45
.51
10.97
1k-10k
5.47
.19
5.66
10k-100k
2.47
.05
2.52
100k-1m
2.27
.04
2.31
1m-10m
1.55
.027
1.56
10m-100m
2.74
.021
2.76
100m+
2.03
.017
2.05
1k = 1000, 1m = 1 million
*all units in %

We assigned equal weights to both likes and comments on any post of the user. This is because most posts have much higher likes than comments which can lead to the thinking that comments should have a higher weighted value in any computation. However, we did not follow this as one user can make multiple comments, however they can like a post only once.

After the number crunching, we created a web portal using HTML, CSS and JS for the front-end with the back-end in Django and hosted on AWS.



This provided a user the ability to enter any username that they wished to check for fake followers.



On entering a username and clicking submit, the back-end dynamically scrapes the user profile in real-time and retrieves the data as before. This data is matched to the established user base according to the corresponding group by number of followers, after which the result is provided.

Output from the Website


The results are computed based on the following metrics -
  • If the searched user’s engagement rate is less than 50% of the ideal in the corresponding range
  • If both the like and comment rate are much lower than the ideal in the corresponding range
  • If the like rate is much higher than the ideal in the corresponding range and comment rate is low
  • If there is a sudden rise in the like or comment rate of the user’s last 10 posts

Using these parameters, we are able to determine the likelihood of a particular user having bought followers or not. For a select few users, we corroborated this with an external source Social Blade. One such example is a user account jajashop89, whose analysis we are providing here.

The web portal provides results as shown -



More explanation on the results -
  • A basic overview of the user is provided, with their username, bio, follower and following counts



  • The computed metrics and the corresponding expected values according to the user’s number of followers are provided along with a verdict on whether the user has bought followers or not


  • Graphs of the like and comment rate for the user’s last 10 posts are provided



In addition, we also manually explored the follower lists of users we deemed to have bought followers to extract the features of a fake follower. We found that most fake followers were actually bots, characterized by the following features -
  • High ‘following’ number but no followers or very few followers
  • Extremely short or empty bios
  • Poor or no content
  • Private account


Results


Here we present the verification of our heuristic data model.We had user with number of followers in the range 10k - 100k.
We have these parameters for evaluation:
  • Engagement Rate
  • Like Rate
  • Comment Rate
  • Number of followers
  • Number of following
We trained our model with Linear SVM classifier, with PCA disabled.
These are the classifier features:
  • Preset: Linear SVM
  • Kernel function: Linear
  • Box constraint level: 1
  • Multiclass method: One-vs-One
  • Standardize data: true     
And here are results from the classifier.

Scatter Plot
The scatter plot here show the prediction of class for validation dataset .



ROC Curve
Receiving Operating Characteristic Curve

TPR/FNR

Confusion Matrix



Comments

Popular posts from this blog

White or Blue, the Whale gets its Vengeance: A Social Media Analysis of the Blue Whale Challenge

The Blue Whale Challenge - a set of tasks that must be completed in a duration of 50 days - is an online social media rage. The tasks of the “game” cause both physical and mental harm to the players; the final task is to take his/her own life. The tasks include waking up at odd hours, listening to psychedelic music, watching scary videos, inflicting cuts and wounds on their bodies and the final task is to commit suicide. The game is supposedly administered by people called “curators” who incite others to take the challenge, brainwash them to cause self harm and ultimately commit suicide. Most conversations between curators and players are suspected to take place via direct message but, in order to find curators, the players need a public platform where they can express their desire to play the game - knowingly or unknowingly. Online social media serves as this platform as people post about not just their desire to be a part of the game but also details and pictures of the various task…

Social Bot Detection on Twitch

Twitch is the leading world live streaming video platform for the Gamer’s community. It is a very famous networking site and has close to 100 million monthly unique users. Bots are very prominent on the network due to various financial favors that the gaming platform provides to a user. The main objective of our Project is Detecting Social Bots on Twitch using various techniques such as Meta-data Analysis, Sentiment analysis from Chats on a Channel, and classification using Machine learning.
We started by collecting usernames of 510 channels for which we compared chatters and viewers on that channels live video. We got 51 channels which had chatters>viewers. On those channels, we did Temporal analysis for over a period of 4 weeks. Alongside, we collected their metadata, such as, Follower, Followings, Status, Partner, and total views. We calculated a Score using these features, from which we could conclude that higher the score, higher the probability of an account being a Bot accoun…

Privacy Concerns on Tinder

Introduction
Mobile dating apps have become a popular means to meet potential partners. Mobile dating application such as Tinder have exploded in popularity in recent years. Most users on Tinder use/have used Facebook as their primary way to sign up. By doing this, Tinder automatically takes user information directly from Facebook, thus saving the need to authenticate the user and user details.  In this project we aim to identify a Tinder profile on Facebook using the information that tinder obtains from Facebook. Below is the information that Tinder takes from a user when they log in for the first time.