Web Tracking Techniques
Posted by Savindra
Web tracking technologies are used to collect, store and connect user web browsing behavior records. The information gained thereby is of interest to various parties.Major motivations for web tracking are:
- Advertisement companies
- Law enforcement and intelligence agencies
- Usability tests of web applications
- Web analytics
The web analytics ﬁeld is concerned with the measurement and interpretation of web site usage data. A variety of information is potentially of interest to web site operators, such as:
- The number of visitors over time
- How visitors ﬁnd out about the web site
- The effectiveness of marketing campaigns
- The geographical location of visitors
- Company identiﬁcation
- Technical details
Web analytics software can be self-hosted, but more commonly third-party services, such as Google Analytics, are used. The collected data can usually be presented visually.
Web Tracking technologies and concepts
A . Clickstream analysis
Clickstream is a recording of the actions performed by a user on a web site. Clickstream analysis deals with the collection and evaluation of this kind of data. Clickstream data to check where visitors enter the web application and then tracks their progress towards a certain predeﬁned goal, such as the subscription to a newsletter or the completion of a purchase. This allows webmasters to see where in the process potential customers are lost. Clickstream data has also been used, for example, to predict whether a user is likely to submit an order on an e-commerce web site and to evaluate the effectiveness of banner advertising.
B. Client identiﬁcation
The Hypertext Transfer Protocol (HTTP) is stateless by design , which means that different page requests are independent of each other. However, in today’s web applications, it is often required to identify a user over many subsequent requests. For example, temporary session information has to be stored for online shops, where users add several products to their shopping carts before ﬁnally proceeding to the checkout. This is called session handling.
Identiﬁcation by IP Address: Every computer on the Internet has an Internet Protocol address, which is used to route data towards it. For various reasons, using this address alone to match a request to a user does not work in todays Internet:
- A single public IP address
- IP addresses can be faked
- Dial-up connections change their IP addresses frequently.
- A single user may have multiple parallel sessions open
HTTP Cookies: HTTP cookies are arbitrary name-value pairs stored in the web browser. Cookies are commonly used for session handling, storage of site preferences, authentication and the identiﬁcation of clients. Typically the short form “cookies” refers to HTTP cookies, even though similar mechanisms exist which are referred to by the same name.
HTTP cookies without an expiration date are automatically deleted when the browser is closed. However, expiration dates can be many years into the future.
Adobe Flash Local Shared Objects: Adobe Flash is a popular browser plugin which is mainly used for animated and interactive web content. By default, the Flash browser plugin allows servers to store Adobe Flash Local Shared Objects, also called Flash Cookies, which are similar to HTTP cookies and can be used for the same purposes. However, they are managed by the Flash plugin itself and not by the web browser.
Web storage: Web storage is a speciﬁcation by the World Wide Web Consortium , which addresses storing and accessing big chunks of data and key-value-pairs in a web browser via client-side scripting.
Silverlight Isolated Storage: Silverlight Isolated Storage is similar to web storage and can be used to store data locally on the user’s computer, such as key/value pairs and arbitrary ﬁles. However, it requires the user to have the Microsoft Silverlight plugin installed.
Google Gears: Gears is an architecture by Google that allows web sites to save data locally such that basic functionality can also be accessed without being connected to the Internet. The user has to give explicit permission for every site that wants to access the store. In Gears, data is not shared between different browsers on the same computer.
*Google now shifting focus from Gears to HTML5.
Hidden Form Fields: The Hypertext Markup Language includes form elements, which are intended to allow users to input data for subsequent transmission to the server via the POST or GET methods.
URL Query Strings:Query strings are pieces of infor- mation appended to the end of URLs, which are sent to the server when the corresponding link is accessed.
HTTP authentication: HTTP natively supports authentication mechanisms, such as Basic access authentication and Digest access authentication. When accessing a web page with authentication turned on, the browser prompts the user for credentials and stores them temporarily. For every subsequent request, these credentials are submitted to the server within the HTTP authorization header, which can be used to identify the session and with it the user.
C. User tracking technologies
User tracking has similarities to session handling. Both require the clear identiﬁcation of a client machine. Hence, techniques from session handling, such as cookies, can also be used for tracking.
Deep packet inspection: Internet service providers have full access to the trafﬁc data of their customers. Deep packet inspection refers to the practice of not only looking at IP packet headers to determine source and destination of a message, but also analyzing the actual payload.
If no encryption is used, this allows ISPs to see exactly what their customers are doing on the Internet, not limited to web browsing activities.
HTTP referrer: The HTTP referrer is a ﬁeld in the HTTP header which contains the page that the user originated from. Using HTTP referrers alone, it is not possible to build extended user tracks across arbitrary domains, but often revealing one step back in the browsing history is already more than the user is comfortable with.
For example, if a user enters a web site from a URL identifying a public user page in a social network, the latter can be linked on tracking servers to existing user records, thereby creating a rich proﬁle. It may not be immediately clear that the referring account page really belongs to the visitor himself. However, in some cases the referrer URL is associated with proﬁle editing or similar actions that can only be done by the owner of a proﬁle.
Web bugs and tracking cookies: Cookies are not only useful for session handling, but can also be used for doubtful user tracking practices.
Zombie cookies: While cookies are in principle an effective mechanism to track users, more people are becoming aware of the privacy implications and clear their cookies regularly. While the removal of HTTP and Flash cookies can be achieved without much effort, Zombie cookies, also called Super-cookies, are designed to be resist deletion efforts.
Browser Fingerprinting: The Electronic Frontier Foundation , an US-based civil liberties group, has recently demonstrated the feasibility of a novel approach to browser identiﬁcation, called browser ﬁngerprinting.
During browser ﬁngerprinting, seemingly insigniﬁcant and non-critical conﬁguration and version data is collected from the web browser, for example:
- The browser’s user agent information ,
- The client’s screen resolution,
- The local timezone,
- The list of installed browser plug-ins,
- The list of installed system fonts,
- The operating system,
- The browser’s language,
- The list of accepted MIME types.
Source - web-tracking_schmuecker