The main purpose was to extract locations (location of incidence) out of the tweets that we collected and flag those locations into 3 categories.
Most Prone to Molestation: More than 800 cases per year.
Relatively Less Prone but quite a few incidents have been reported.
Relatively very safe - Very few or no incidents have been reported
1) Extracted more than 10K Tweets using HashTags.
#sexualabuse , etc
Challenge : Most of the tweets are not geo tagged .
We came up with a solution. We tried to explore the metadata of the tweets i.e. when
people use hashtags of in the tweet they sometimes mention the location like
We did some pre-processing on the tweet text and the metadata to extract locations
out of it. The preprocessed text which are supposed to be locations are passed to geocoderAPI and we got the latitude and longitude. Still Our Problem Is not Solved. This technique returned 1/5th of the tweets with locations in it. Hmmmm……..
Since most of the tweets contain images, news cutting etc. We used the technique OCR(Optical Character Recognition) for extracting text out of those images and extracting locations out of it.
Image is given Below
The red circles indicate the locations. We got news clippings from the timeline of police handles and news handles in twitter.
Additionally we used web scraping on the news website to extract locations of certain incidents.We also used a google add on – > Twitter Archiver to actually extract tweets based on our hashtags and filtered location.
And here we are ---------------------------------
The 30 Km Radius shows around you , the place is quite unsafe. You can drag the purple marker to any place you want , and you can see whether you are
safe within the 30 Km radius.
Purple Marker denotes your current location .....
Improvements : Accuracy is low. Many locations we retrieved suggests certain incidents did not occur at that place. So to check manually all the posts is impossible . So we need to automate this thing.
Secondly , one thing that can be extended in this. If we can get street level data precisely , we can provide an alternative walking/driving route to the user given the crime rate of a location.
Thirdly, if we can find the time of the day when these incidents are occurring then it becomes more effective.