Skip to main content
Baitfree, Clickbait Detection Tool

Given a Youtube video, classifying whether the video on Youtube is a clickbait or not


Members: Anshul Anil Gaur(2014020), Nikhil M Prasanna(2014060), Ojasvi Singh Randhawa(2014070), Aditya Dwivedi(2014128), Avadh Yadav(2014026), Nishant Yadav(2014067)


Problem Statement
The main problem that we are trying to solve in our project is - determining a way to detect whether a video on Youtube is clickbait or not.

Motivation
A video is defined as click bait if it has irrelevant title, thumbnail, tags, description etc. associated to actual video’s contents. Video thumbnails are set by video uploaders to have very flashy image which does not even appear in the video. Another way the video uploaders on Youtube make clickbait videos is by creating a very misleading title of the video which tempts people to watch the video out of curiosity after reading the title.
At times uploaders add certain tags like “PewDiePie” etc. to their videos which are entirely irrelevant. This allows uploaders to have their videos listed alongside authentic videos. Though clickbait increases the turnover for the youtube. But in long term can force people to leave the platform. If more and more videos on Youtube start becoming clickbaits then the users who visit Youtube to watch good videos will not be satisfied by the quality of videos as they would feel deceived and would be unhappy that they are not able to watch what they were promised by the video’s title or thumbnail.
This problem is not an easy one, it is a relatively hard problem, as some might consider ratio likes to dislikes might be a deciding factor but it can’t solely determine whether the video is click bait or not.
Example: Top two most disliked videos on YouTube: “Baby by Justin Bieber” and “Call of Duty infinite warfare reveal trailer” are not click baits. Hence this is not a simple problem.

Types of ClickBait

Sexy Clickbait: The thumbnail or title contains sexual content but the video does not



Self admitted Clickbait: The title itself states that it is a clickbait



Unanswered Clickbait: Title promises to answer a question, but the video never answers



Weird Clickbait: Thumbnail contains some image completely irrelevant to video



Methodology
Features used to train our model :
  • Likes to Dislikes Ratio
  • View Count
  • Comment Count
  • Explicit Content in Thumbnail
  • Age Restriction
  • Comments Negativity Score

We extracted all the above features and normalized them into labels to train a decision tree. Normalization is as follows :
  • Like to Dislikes Ratio
  • View Count : Label = no. of digits in view count.
  • Comment Count : Label = no. of digits in comment count, -1 for no comments and -2 for comments disabled.
  • Explicit Content in Thumbnail : 0 for highly likely, 1 for likely, 2 for neutral 3 for unlikely, 4 for highly unlikely
  • Age Restriction : 0 for off, 1 for on
  • Comments Negativity Score : 0 to 10. Higher means more negative

In our project, given any video’s URL(Youtube video), our software will process that URL and will produce an output which would describe how likely(probabilistically) the given video is a clickbait.
We also used Google Cloud platform, Vision API for Image content analysis.
We have created a Youtube extension tool which indicates “Yes” or “No”, whether the given video is a clickbait(Yes) or not(No). We have also added the extra feature that the user himself/herself can click on that indicator to know the false positive and false negative values for that video. And if the user is not satisfied with the result he/she gets the user can also provide his/her feedback and indicate whether he/she finds the video clickbait or not. Our server then takes this data into account and our system learns through machine learning and gives more precise outputs with time.

Screenshot of our code:-



Future

The future extension of our project is that we can also extend our clickbait detection tool to other platforms, not just Youtube. Since other video platforms like Vimeo and Twitch have similar features like comments, thumbnail, title of the video etc. Thus our tool can also work on that as well, we would just have to change the API calls specific to the video platforms but the rest of the concept will remain the same.

Some pics from poster presentation:-





THE TEAM
Aditya Dwivedi, Anshul Anil Gaur, Nishant Yadav, Avadh Yadav, Nikhil M Prasanna, Ojasvi Singh Randhawa


References:-
https://cloud.google.com/vision/
https://www.youtube.com/watch?v=-13yIXiyFAs

Image Sources:-
https://www.youtube.com/results?search_query=pewdiepie+clickbait
https://www.youtube.com/results?search_query=pewdiepie+sexual
https://www.youtube.com/results?search_query=pewdiepie+how+much+money+does+he+make
https://www.youtube.com/results?search_query=pewdiepie+weird


Comments

Popular posts from this blog

White or Blue, the Whale gets its Vengeance: A Social Media Analysis of the Blue Whale Challenge

The Blue Whale Challenge - a set of tasks that must be completed in a duration of 50 days - is an online social media rage. The tasks of the “game” cause both physical and mental harm to the players; the final task is to take his/her own life. The tasks include waking up at odd hours, listening to psychedelic music, watching scary videos, inflicting cuts and wounds on their bodies and the final task is to commit suicide. The game is supposedly administered by people called “curators” who incite others to take the challenge, brainwash them to cause self harm and ultimately commit suicide. Most conversations between curators and players are suspected to take place via direct message but, in order to find curators, the players need a public platform where they can express their desire to play the game - knowingly or unknowingly. Online social media serves as this platform as people post about not just their desire to be a part of the game but also details and pictures of the various task…

Social Bot Detection on Twitch

Twitch is the leading world live streaming video platform for the Gamer’s community. It is a very famous networking site and has close to 100 million monthly unique users. Bots are very prominent on the network due to various financial favors that the gaming platform provides to a user. The main objective of our Project is Detecting Social Bots on Twitch using various techniques such as Meta-data Analysis, Sentiment analysis from Chats on a Channel, and classification using Machine learning.
We started by collecting usernames of 510 channels for which we compared chatters and viewers on that channels live video. We got 51 channels which had chatters>viewers. On those channels, we did Temporal analysis for over a period of 4 weeks. Alongside, we collected their metadata, such as, Follower, Followings, Status, Partner, and total views. We calculated a Score using these features, from which we could conclude that higher the score, higher the probability of an account being a Bot accoun…

Privacy Concerns on Tinder

Introduction
Mobile dating apps have become a popular means to meet potential partners. Mobile dating application such as Tinder have exploded in popularity in recent years. Most users on Tinder use/have used Facebook as their primary way to sign up. By doing this, Tinder automatically takes user information directly from Facebook, thus saving the need to authenticate the user and user details.  In this project we aim to identify a Tinder profile on Facebook using the information that tinder obtains from Facebook. Below is the information that Tinder takes from a user when they log in for the first time.