INTRODUCTION
TWITTER is a popular online social network and microbloging service for exchanging messages (also known as tweets) among people, supported by a huge ecosystem. Twitter announces that it has over 140 million active users creating more than 340 million messages every day [26] and over one million registered applications built by more than 750,000 developers [25]. The third party applications include client applications for various platforms, such as Windows, Mac, iOS, and Android, and web-based applications such as URL shortening services, image-sharing services, and news feeds. Among the third party services, URL shortening services which provide a short alias of a long URL is an essential service for Twitter users who want to share long URLs via tweets having length restriction. Twitter allows users to post up to 140-character tweets containing only texts. Therefore, when users want to share complicated information (e.g., news and multimedia), they should include a URL of a web page containing the information into a tweet. Since the length of the URL and associated texts may exceed 140 characters, Twitter users demand URL shortening services further reducing it. Some URL shortening services (e.g., bit.ly and goo.gl) also provide shortened URLs’ public click analytics consisting of the number of clicks, countries, browsers, and referrers of visitors. Although anyone can access the data to analyze visitor statistics, no one can extract specific information about individual visitors from the data because URL shortening services provide them as an aggregated form to protect the privacy of visitors from attackers.
INFERENCE ATTACK
An Inference Attack is a data mining technique performed by analyzing data in order to illegitimately gain knowledge about a subject or database. we detect a simple inference attack that can estimate individual visitors from the aggregated, public click analytics using public metadata provided by Twitter. First, we examine the metadata of client application and location because they can be correlated with those of public click analytics. For instance, if a user, Alice, updates her messages using the official Twitter client application for iPhone, “Twitter for iPhone” will be included in the source field of the corresponding metadata. Moreover, Alice may disclose on her profile page that she lives in the USA or activate the location service of a Twitter client application to
OUR APPROACH
We periodically monitor click analytics of shortened URLs to observe its instant changes made by a new visitor. Whenever we notice that there is a new visitor, we match his or her information with each of our target users to know whether the new visitor is one of our target users. We can estimate information about visitors by checking the differences between the new and the old click analytics.
However, the periodic monitoring and matching have a limitation because Twitter does not officially provide personal information about users such as country, browsers, and platforms. We need to infer the information about target users by investigating their timeline and profile pages.
- REFERRERS
We determine whether a new visitor comes from Twitter by using the changed referrer information of public click analytics.In most cases, “t.co” is recorded because all links shared on Twitter are automatically shortened to t.co links. t.co handles redirections by context and user agents so that the Referrer information varies according to the source of a click. In some cases, “twitter.com” is recorded because some Twitter applications directly use original links instead of t.co links. Consequently, if the Referrers information of the visitor is “t.co” or “twitter.
2) COUNTRY
We infer the country information of target users using the location field of their profile pages and compare it with the changed country information of public click analytics. In many cases, Twitter users fill in the location field with their city or place name. We can determine the user’s country by searching GeoNames with the information in the location field of the user’s Twitter profile. GeoNames returns the country code that corresponds to the search keywords. The country information provided by the click analytics is also a country code, so we have a successful country match if both country codes are the same.
3) BROWSERS AND PLATFORM
When our target users click on a shortened URL we use the information about the user’s browser and platform to increase the inference accuracy. Although Twitter does not provide information of this nature about its users, it does record the name of the application that was used to post a tweet. For example, when someone posts a tweet using the official Twitter client application for the iPhone, the information “Twitter for iPhone” is recorded in the source field of the tweet, which enables us to use this information to infer the browser and platform that were used.
we perform the simple inference attack on behalf of Alice’s boyfriend, Bob, as follows. Bob first posts a tweet with a URL shortened by goo.gl. If Alice clicks on the shortened URL, goo.gl records {“country”: “US”, “platform”: “iPhone”, “referrer”: “twitter.com”, “browser”: “Mobile”} in the click analytics of the shortened URL (details are in Sections 2 and 3). Otherwise, goo.gl records no information. Later, Bob retrieves the click analytics of the shortened URL to know whether Alice clicks on his URL. If the click analytics is unchanged or if its changes do not include information about the USA, iPhone, and twitter.com, he infers that Alice does not click on his URL. Otherwise, he infers that Alice click on his URL.
Comments
Post a Comment