Skip to main content

iFROOSN: Incentivised Fake Reviews On OSNs with Yelp as the reference


Yelp is an OSN primarily used to popularise the businesses and give reviews about those business. Yelp can be used as an efficient business expander for many upcoming restaurants/spas/saloons who always look for new customers.






Problem Statement


Our main objective of this course project was to target fake/incentivised reviews on yelp and give a credibility score using which a new user of Yelp can get an overall estimate about the restaurant he/she will visit .We developed an application which required an business ID of yelp as an input and it gave the credibility score as the output along with some inferred results in form of graphs

Dataset


The primary requirement before starting the project was collecting dataset for Yelp business and corresponding reviews and details about the user which post these reviews .The dataset was obtained through Yelp dataset challenge which was available for academic usage and result collections .The database had predefined schema and other data which was not available through schema was web scrapped or collected through API usage.





Data Collection Details


The data available through yelp dataset challenge comprised of over 15 million values and thus for fast retrieval of information and efficient processing data was scaled down to 0.2 million values for each of bussiness ,reviews ,user details .

Methodology


Our process method was a two way strategy comprising of checking for user details and checking for text plagiarism .We gave a score on our application comprising of normalized score values of various parameters and metrics.




So broadly these four metrics were considered while considering for fake review detection .

1. User Rating deviation :It consisted of reducing normalised score of users who gave a score that is hugely deviated from their average score which the user gives

2. Business Rating Deviation :If the business for which fake review detection is targetted showed huge deviation from the overall business score then overall normalised score of that business was reduced significantly.

3. User review plagiarism :In this metric we checked for plagiarism for the review which the user had written and checked if certain review existed in our dataset or not .

Review plagiarism was checked based on these parameters:

i) Levenshtein distance: Consisted of minimum no. of string operations required to convert one string to another.

ii) Jaro Winkler distance: Consisted of minimum no. of string addition ,removal,rotation operations to convert one string to another .

iii) NER: This parameter involved searching if certain entities like name , location, references were found to be same across other reviews as well or not.

4. User Location: This parameter consisted of checking for location of user who wrote the review and comparing it with the location of the business.If the distance was found larger than the threshold then that review was flagged .

Results


Based on our parameters, we were able to find some profiles/reviews that can be classified as fake with high degree of accuracy.

        
        
             

Presentation and Team


Akash Kumar Gautam    (2015011)
Mayank Kumar                (2015055)
Sahil Babbar       (2013082)
Shyam Agrawal       (2015099)

       
  






References


  1. Yelp.com
  2. https://link.springer.com/chapter/10.1007/978-3-319-11119-3_1


Comments

Popular posts from this blog

White or Blue, the Whale gets its Vengeance: A Social Media Analysis of the Blue Whale Challenge

The Blue Whale Challenge - a set of tasks that must be completed in a duration of 50 days - is an online social media rage. The tasks of the “game” cause both physical and mental harm to the players; the final task is to take his/her own life. The tasks include waking up at odd hours, listening to psychedelic music, watching scary videos, inflicting cuts and wounds on their bodies and the final task is to commit suicide. The game is supposedly administered by people called “curators” who incite others to take the challenge, brainwash them to cause self harm and ultimately commit suicide. Most conversations between curators and players are suspected to take place via direct message but, in order to find curators, the players need a public platform where they can express their desire to play the game - knowingly or unknowingly. Online social media serves as this platform as people post about not just their desire to be a part of the game but also details and pictures of the various task…

Social Bot Detection on Twitch

Twitch is the leading world live streaming video platform for the Gamer’s community. It is a very famous networking site and has close to 100 million monthly unique users. Bots are very prominent on the network due to various financial favors that the gaming platform provides to a user. The main objective of our Project is Detecting Social Bots on Twitch using various techniques such as Meta-data Analysis, Sentiment analysis from Chats on a Channel, and classification using Machine learning.
We started by collecting usernames of 510 channels for which we compared chatters and viewers on that channels live video. We got 51 channels which had chatters>viewers. On those channels, we did Temporal analysis for over a period of 4 weeks. Alongside, we collected their metadata, such as, Follower, Followings, Status, Partner, and total views. We calculated a Score using these features, from which we could conclude that higher the score, higher the probability of an account being a Bot accoun…

Privacy Concerns on Tinder

Introduction
Mobile dating apps have become a popular means to meet potential partners. Mobile dating application such as Tinder have exploded in popularity in recent years. Most users on Tinder use/have used Facebook as their primary way to sign up. By doing this, Tinder automatically takes user information directly from Facebook, thus saving the need to authenticate the user and user details.  In this project we aim to identify a Tinder profile on Facebook using the information that tinder obtains from Facebook. Below is the information that Tinder takes from a user when they log in for the first time.