PREDICTION AND ANALYSIS ON FOURSQUARE
What is the project about ?
The project's aim was to analyse the network of Foursquare . Foursquare is a mobile app which provides search results to users . The app provides personalised recommendations of places to go near a user's location based on users' previous browsing history , purchases , or check-in history . The app is a location based social network and therefore privacy regarding user's location is at high risk . Our task was to predict users' home / Office and analyse, using the data provided by Foursquare .
What is the methodology used to predict Users' Home / Office ?
We chose python as our language for extracting the foursquare data . Libraries used are mentioned on the bottom of the page as references .
Below is the attached poster describing the process -
What do you mean by comparing empirically ?
We plotted user's tips' location on the world map (using Basemap of Matplotlib ) . There is not any formula to predict any user's home location . It has to be done empirically only . And with that come problems of Accuracy of the data . There was no Ground truth data . Only possiblility for any ground truth data was of the location provided by a user in his/her bio. But Obviously, one can't rely on it as we found many cases where bio was not properly written or was left empty or instead of the place's name , the name of any person was written .
Then how can you prove the data you collected was Accurate ?
Ok. So here is a functional diagram of Kinjil Mathur's home/office prediction (used to be VP of marketing of Foursquare) which we showed in our poster presentation .
1. Collected Kinjil's tips' addresses .
2. Then plotted them on the map.
3. Collected the Home city of her friends on Foursquare .
The home city of friends and tips' addresses were enough to tell that she is from New York . Because, one makes friends predominantly where he works and also that a person eats most of the times from restaurants nearby the Office / Home location . But the task wasn't over . We were determined to find exact location of the person .
4 . Collected the postal codes of the tips and sorted according to their frequency . Now, postal code is something that can give you very exact information about a person's location . The most freqquent postal code was mapped and the result's snapshot is given below .
Source - Google Maps
The location traced was 1.4 miles away from the office of foursquare . And this prediction is accurate as there was not any other person named Kinjil and also she was the VP of Marketing in Foursquare when the checkins were done on Foursquare , so obviously she worked in Foursquare office and lived nearby .
We only considered bio of some profiles as accurate and among them too, the data accurate of those whose profile possesed these following characteristics -
1 . Any person who has a decent public profile and whose data can also be found on other sites on internet .
2 . The data (location data) of such profiles should match the data on other internet sites such as facebook , twitter etc as people with decent public profile has verified profiles on such sites .
3. People who have large dataset like more than 100 friends, more than 20 tips etc were chosen .
But is that enough to prove the data accuracy as their is still no Ground Truth and things are still in assumption zone ?
No, obviously one can't just trust such assumptions fully . But one can treat them accurate to a certain level . So , at the time when we were doing the project , at this stage, we too got stuck . We needed something to prove the accuracy of our data . So, we started searching out for the problem on google . And then we saw this research paper named " I Know Where You Live " written by some MIT students (Reference is at the bottom). Below are two snapshots of their work about the accuracy of empirical analysis of any Location based Social Network .
Source - http://people.csail.mit.edu/ilaria/papers/LiccardiCHI2016.pdf |
Source - http://people.csail.mit.edu/ilaria/papers/LiccardiCHI2016.pdf |
The first diagram shows the correct responses (empirical analysis of location using ground truth data of known people) which they got with different density of datasets . This shows that the percentage of people giving responses are higher in case of prediction of Office mainly in low and high density datasets .
Also, if we consider the above table, it states that the mean accuracy was 69 % for the prediction of workplace .
A point to note here is that this is not just valid for one social network but all LBSNs (Location Based Social Networks) (as per the research ).
Any problems faced in the project ?
The main problem that occurs with the Foursquare is that the data of people's location is not that large that it can be used for a smooth and accurate analysis and study which is obvious because people don't really search about any restaurant which is present in their locality and if they do so , many don't care to post . Most of the people care to post only when they are on any trip , which gives the data of one or two locations only.
OUR GROUP
From Right to Left -
1. Sumeet Bhardwaj (Group Leader)
2. Nickey Kumar
3. Aman Verma
4. Sanidhya Daeeyya
5. Azhar Tak
REFERENCES
1. I Know Where You Live: Inferring Details of People’s Lives
by Visualizing Publicly Shared Location Data (http://people.csail.mit.edu/ilaria/papers/LiccardiCHI2016.pdf)
2. Libraries used:
- https://github.com/mLewisLogic/foursquare
- Basemap, Matplotlib ( https://matplotlib.org/basemap/ ) .
Comments
Post a Comment