Skip to main content

Fight against racism and hate speech on reddit

Reddit is an American social news aggregationand discussion website. Registered members submit content to the site such as links, text posts, and images, which are then voted up or down by other members. Posts are organized by subject into boards called "subreddits", which cover a variety of topics including news, science, movies, video games, music, books, fitness, food, and image-sharing. Registering an account with Reddit is free and does not require an email address to complete. Since Reddit is an open platform and anyone is free to post anything, there is no (or very limited) censorship. Due to this, we have come across many offensive posts which are filled negative comments.

How is Reddit structured?

Racism and hate speech can cause a lot of damage to both individuals and communities. A study of 800 Australian secondary school students discovered that racism had huge mental health impacts on students who experience it. We have built a plugin to battle this hate speech on reddit.


  • Collection of data using Reddit API and then cleaning the data. The data was collected from subreddits such as r/ImGoingToHellForThis/ , r/Incels/ etc.
  • The text from the posts and their comments are extracted which becomes the training part of our machine learning model. Each comment was annotated by 3 members. If the comment was considered offensive it was marked as 1 else 0.
  • A machine learning model based on SVM was trained using the collected dataset (Tfidvectorizer was used instead of feature set).
  • The model was tested on the posts of few subreddits like :/r/gaming, /r/Iamgoingtohellforthis,  /r/aww, /r/MadeMeSmile.
  • Top 200 comments from a post are scanned, and if the number of offensive comments crosses a certain threshold the post will be labelled as offensive.

  • snippet of the dataset collected using praw (Python reddit wrapper)

    Some points for the methodology:
    • In the training set, we have taken 50% offensive and 50% non-offensive cases in order to increase the accuracy of the model and keep it fair.
    • The size of the training set was 950 comments.
    • The model was improved using Vader sentiment analysis which is a python library for sentiment analysis. Our final plugin used this to improve the accuracy. Comments were labelled according to the score it got. 

    The plugin was tested on 4 major subreddits. /r/ImGoingToHellForThis is a well know subreddit to have offensive content. This was the first subreddit which we analysed. 10 posts from this subreddit were tested (All time top). A similar analysis was done for 3 other subreddits and the findings matched our expectations.  /r/ImGoingToHellForThis had the most number of offensive posts. /r/gaming came in just behind it. /r/aww and /r/MadeMeSmile which are popular family-friendly subreddits had the least number of offensive posts.

    TOTAL POSTS = 10*4 = 40
    ACCURACY = 0.875

    observations for the 4 subreddits analysed

    For the final plugin we used vader sentiment analysis to classify the comment as offensive or not. It uses a bag of word approach. Whenever it comes across an offensive word, the sentence is given a high negative score. This helped us classifying the comments. Accuracy for linearSVC could be improved by increasing the data-set by a lot and tweaking the parameters. But with the data set we had vader gave us the best score.

    Linear SVC vs Vader accuracy

    Final Plugin and poster presentation:

    • We used a django server to communicate with the plugin and the python machine learning model.
    • Whenever a user opens a post, a HTTP POST request is sent to the server. The score is calculated and returned as a response (A nudge for the user).

    response to a non-offensive post

    response to an offensive post

    Some pics from the poster presentation

    Group Members:

    Ashutosh Batabyal (Group Leader)

    Shreya Sharma

    Abhishek Chauhan

    Shivani Raina

    Aarushi Arya

    Sarthak Jindal




    Popular posts from this blog

    White or Blue, the Whale gets its Vengeance: A Social Media Analysis of the Blue Whale Challenge

    The Blue Whale Challenge - a set of tasks that must be completed in a duration of 50 days - is an online social media rage. The tasks of the “game” cause both physical and mental harm to the players; the final task is to take his/her own life. The tasks include waking up at odd hours, listening to psychedelic music, watching scary videos, inflicting cuts and wounds on their bodies and the final task is to commit suicide. The game is supposedly administered by people called “curators” who incite others to take the challenge, brainwash them to cause self harm and ultimately commit suicide. Most conversations between curators and players are suspected to take place via direct message but, in order to find curators, the players need a public platform where they can express their desire to play the game - knowingly or unknowingly. Online social media serves as this platform as people post about not just their desire to be a part of the game but also details and pictures of the various task…

    Identifying Tinder Profiles on Facebook

    Identifying Tinder Profiles on Facebook
    In the online world, everything that you ever put is linked and connected. You might think that you’ve put some information on one platform and that’s it, you’re good to go. But you, my friend, are sadly mistaken. With this thought in mind and the privacy concerns linked with Online Social Media, we would like to introduce you to our problem statement: Identifying Facebook Profiles from Tinder Profiles. Given a tinder profile, our aim is to identify the corresponding Facebook profile of that person. We are addressing the linkability issue here and trying to highlight how more information than what you’ve mentioned on Tinder can be picked up from your Facebook profile. For those who don’t know, Tinder is a Dating Platform available for a Mobile Application and a Web App. It shows the geographically close profiles around you and you have an option to right swipe(Like) or left swipe(Dislike) them. When two people right swipe each other then it’s a m…

    Privacy Concerns on Tinder

    Mobile dating apps have become a popular means to meet potential partners. Mobile dating application such as Tinder have exploded in popularity in recent years. Most users on Tinder use/have used Facebook as their primary way to sign up. By doing this, Tinder automatically takes user information directly from Facebook, thus saving the need to authenticate the user and user details.  In this project we aim to identify a Tinder profile on Facebook using the information that tinder obtains from Facebook. Below is the information that Tinder takes from a user when they log in for the first time.