Skip to main content

iFROOSN: Incentivised Fake Reviews On OSNs with Yelp as the reference

Yelp is an OSN primarily used to popularise the businesses and give reviews about those business. Yelp can be used as an efficient business expander for many upcoming restaurants/spas/saloons who always look for new customers.

Problem Statement

Our main objective of this course project was to target fake/incentivised reviews on yelp and give a credibility score using which a new user of Yelp can get an overall estimate about the restaurant he/she will visit .We developed an application which required an business ID of yelp as an input and it gave the credibility score as the output along with some inferred results in form of graphs


The primary requirement before starting the project was collecting dataset for Yelp business and corresponding reviews and details about the user which post these reviews .The dataset was obtained through Yelp dataset challenge which was available for academic usage and result collections .The database had predefined schema and other data which was not available through schema was web scrapped or collected through API usage.

Data Collection Details

The data available through yelp dataset challenge comprised of over 15 million values and thus for fast retrieval of information and efficient processing data was scaled down to 0.2 million values for each of bussiness ,reviews ,user details .


Our process method was a two way strategy comprising of checking for user details and checking for text plagiarism .We gave a score on our application comprising of normalized score values of various parameters and metrics.

So broadly these four metrics were considered while considering for fake review detection .

1. User Rating deviation :It consisted of reducing normalised score of users who gave a score that is hugely deviated from their average score which the user gives

2. Business Rating Deviation :If the business for which fake review detection is targetted showed huge deviation from the overall business score then overall normalised score of that business was reduced significantly.

3. User review plagiarism :In this metric we checked for plagiarism for the review which the user had written and checked if certain review existed in our dataset or not .

Review plagiarism was checked based on these parameters:

i) Levenshtein distance: Consisted of minimum no. of string operations required to convert one string to another.

ii) Jaro Winkler distance: Consisted of minimum no. of string addition ,removal,rotation operations to convert one string to another .

iii) NER: This parameter involved searching if certain entities like name , location, references were found to be same across other reviews as well or not.

4. User Location: This parameter consisted of checking for location of user who wrote the review and comparing it with the location of the business.If the distance was found larger than the threshold then that review was flagged .


Based on our parameters, we were able to find some profiles/reviews that can be classified as fake with high degree of accuracy.


Presentation and Team

Akash Kumar Gautam    (2015011)
Mayank Kumar                (2015055)
Sahil Babbar       (2013082)
Shyam Agrawal       (2015099)





Popular posts from this blog

White or Blue, the Whale gets its Vengeance: A Social Media Analysis of the Blue Whale Challenge

The Blue Whale Challenge - a set of tasks that must be completed in a duration of 50 days - is an online social media rage. The tasks of the “game” cause both physical and mental harm to the players; the final task is to take his/her own life. The tasks include waking up at odd hours, listening to psychedelic music, watching scary videos, inflicting cuts and wounds on their bodies and the final task is to commit suicide. The game is supposedly administered by people called “curators” who incite others to take the challenge, brainwash them to cause self harm and ultimately commit suicide. Most conversations between curators and players are suspected to take place via direct message but, in order to find curators, the players need a public platform where they can express their desire to play the game - knowingly or unknowingly. Online social media serves as this platform as people post about not just their desire to be a part of the game but also details and pictures of the various task…

Identifying Tinder Profiles on Facebook

Identifying Tinder Profiles on Facebook
In the online world, everything that you ever put is linked and connected. You might think that you’ve put some information on one platform and that’s it, you’re good to go. But you, my friend, are sadly mistaken. With this thought in mind and the privacy concerns linked with Online Social Media, we would like to introduce you to our problem statement: Identifying Facebook Profiles from Tinder Profiles. Given a tinder profile, our aim is to identify the corresponding Facebook profile of that person. We are addressing the linkability issue here and trying to highlight how more information than what you’ve mentioned on Tinder can be picked up from your Facebook profile. For those who don’t know, Tinder is a Dating Platform available for a Mobile Application and a Web App. It shows the geographically close profiles around you and you have an option to right swipe(Like) or left swipe(Dislike) them. When two people right swipe each other then it’s a m…

Privacy Concerns on Tinder

Mobile dating apps have become a popular means to meet potential partners. Mobile dating application such as Tinder have exploded in popularity in recent years. Most users on Tinder use/have used Facebook as their primary way to sign up. By doing this, Tinder automatically takes user information directly from Facebook, thus saving the need to authenticate the user and user details.  In this project we aim to identify a Tinder profile on Facebook using the information that tinder obtains from Facebook. Below is the information that Tinder takes from a user when they log in for the first time.