Skip to main content
Baitfree, Clickbait Detection Tool

Given a Youtube video, classifying whether the video on Youtube is a clickbait or not

Members: Anshul Anil Gaur(2014020), Nikhil M Prasanna(2014060), Ojasvi Singh Randhawa(2014070), Aditya Dwivedi(2014128), Avadh Yadav(2014026), Nishant Yadav(2014067)

Problem Statement
The main problem that we are trying to solve in our project is - determining a way to detect whether a video on Youtube is clickbait or not.

A video is defined as click bait if it has irrelevant title, thumbnail, tags, description etc. associated to actual video’s contents. Video thumbnails are set by video uploaders to have very flashy image which does not even appear in the video. Another way the video uploaders on Youtube make clickbait videos is by creating a very misleading title of the video which tempts people to watch the video out of curiosity after reading the title.
At times uploaders add certain tags like “PewDiePie” etc. to their videos which are entirely irrelevant. This allows uploaders to have their videos listed alongside authentic videos. Though clickbait increases the turnover for the youtube. But in long term can force people to leave the platform. If more and more videos on Youtube start becoming clickbaits then the users who visit Youtube to watch good videos will not be satisfied by the quality of videos as they would feel deceived and would be unhappy that they are not able to watch what they were promised by the video’s title or thumbnail.
This problem is not an easy one, it is a relatively hard problem, as some might consider ratio likes to dislikes might be a deciding factor but it can’t solely determine whether the video is click bait or not.
Example: Top two most disliked videos on YouTube: “Baby by Justin Bieber” and “Call of Duty infinite warfare reveal trailer” are not click baits. Hence this is not a simple problem.

Types of ClickBait

Sexy Clickbait: The thumbnail or title contains sexual content but the video does not

Self admitted Clickbait: The title itself states that it is a clickbait

Unanswered Clickbait: Title promises to answer a question, but the video never answers

Weird Clickbait: Thumbnail contains some image completely irrelevant to video

Features used to train our model :
  • Likes to Dislikes Ratio
  • View Count
  • Comment Count
  • Explicit Content in Thumbnail
  • Age Restriction
  • Comments Negativity Score

We extracted all the above features and normalized them into labels to train a decision tree. Normalization is as follows :
  • Like to Dislikes Ratio
  • View Count : Label = no. of digits in view count.
  • Comment Count : Label = no. of digits in comment count, -1 for no comments and -2 for comments disabled.
  • Explicit Content in Thumbnail : 0 for highly likely, 1 for likely, 2 for neutral 3 for unlikely, 4 for highly unlikely
  • Age Restriction : 0 for off, 1 for on
  • Comments Negativity Score : 0 to 10. Higher means more negative

In our project, given any video’s URL(Youtube video), our software will process that URL and will produce an output which would describe how likely(probabilistically) the given video is a clickbait.
We also used Google Cloud platform, Vision API for Image content analysis.
We have created a Youtube extension tool which indicates “Yes” or “No”, whether the given video is a clickbait(Yes) or not(No). We have also added the extra feature that the user himself/herself can click on that indicator to know the false positive and false negative values for that video. And if the user is not satisfied with the result he/she gets the user can also provide his/her feedback and indicate whether he/she finds the video clickbait or not. Our server then takes this data into account and our system learns through machine learning and gives more precise outputs with time.

Screenshot of our code:-


The future extension of our project is that we can also extend our clickbait detection tool to other platforms, not just Youtube. Since other video platforms like Vimeo and Twitch have similar features like comments, thumbnail, title of the video etc. Thus our tool can also work on that as well, we would just have to change the API calls specific to the video platforms but the rest of the concept will remain the same.

Some pics from poster presentation:-

Aditya Dwivedi, Anshul Anil Gaur, Nishant Yadav, Avadh Yadav, Nikhil M Prasanna, Ojasvi Singh Randhawa


Image Sources:-


Popular posts from this blog

Identifying Tinder Profiles on Facebook

Identifying Tinder Profiles on Facebook In the online world, everything that you ever put is linked and connected. You might think that you’ve put some information on one platform and that’s it, you’re good to go. But you, my friend, are sadly mistaken. With this thought in mind and the privacy concerns linked with Online Social Media, we would like to introduce you to our problem statement: Identifying Facebook Profiles from Tinder Profiles. Given a tinder profile, our aim is to identify the corresponding Facebook profile of that person. We are addressing the linkability issue here and trying to highlight how more information than what you’ve mentioned on Tinder can be picked up from your Facebook profile. For those who don’t know, Tinder is a Dating Platform available for a Mobile Application and a Web App. It shows the geographically close profiles around you and you have an option to right swipe(Like) or left swipe(Dislike) them. When two people right swipe each other then it’

iFROOSN: Incentivised Fake Reviews On OSNs with Yelp as the reference

Yelp is an OSN primarily used to popularise the businesses and give reviews about those business. Yelp can be used as an efficient business expander for many upcoming restaurants/spas/saloons who always look for new customers. Problem Statement Our main objective of this course project was to target fake/incentivised reviews on yelp and give a credibility score using which a new user of Yelp can get an overall estimate about the restaurant he/she will visit .We developed an application which required an business ID of yelp as an input and it gave the credibility score as the output along with some inferred results in form of graphs Dataset The primary requirement before starting the project was collecting dataset for Yelp business and corresponding reviews and details about the user which post these reviews .The dataset was obtained through Yelp dataset challenge which was available for academic usage and result collections .The database had predefined schema and

Privacy Control

Online social networks have become an important part of our social lives, and their inherent privacy problems have become a major concern for users. As of March 2016, 142 million Indians maintain a social network profile on Facebook and 30 million on Twitter, which provides them with a convenient way to communicate with family, friends and even total strangers. The Services provided by social media though add convenience to our life to a great extent and have made the world a much closely connected, this boon comes with few hidden problems. Though social media lets users share a part of our life to the world, it also gives birth to the security threats to our personal information.  The users are confronted with a dichotomy between sharing information with their loved ones and friends and sharing information with everyone else on the internet. To help users tackle this dilemma, social networks provide a plethora of privacy settings which allow the user to control his/her pri