Web Tracking Techniques

woopra

Web tracking technologies are used to collect, store and connect user web browsing behavior records. The information gained thereby is of interest to various parties.Major motivations for web tracking are:

    • Advertisement companies
    • Law enforcement and intelligence agencies
    • Usability tests of web applications
    • Web analytics

Web analytics

The web analytics field is concerned with the measurement and interpretation of web site usage data. A variety of information is potentially of interest to web site operators, such as:

    • The number of visitors over time
    • How visitors find out about the web site
    • The effectiveness of marketing campaigns
    • The geographical location of visitors
    • Company identification
    • Technical details

Web analytics software can be self-hosted, but more commonly third-party services, such as Google Analytics, are used. The collected data can usually be presented visually.

Usability Tests

Tracking technology can also be handy for usability evaluations of web applications, to help guide decisions in their design and development process. With JavaScript, it is possible to capture detailed records of user mouse and keyboard input. Thus, the interaction with web sites can be analyzed in detail.

Web Tracking technologies and concepts

A . Clickstream analysis

Clickstream is a recording of the actions performed by a user on a web site. Clickstream analysis deals with the collection and evaluation of this kind of data. Clickstream data to check where visitors enter the web application and then tracks their progress towards a certain predefined goal, such as the subscription to a newsletter or the completion of a purchase. This allows webmasters to see where in the process potential customers are lost. Clickstream data has also been used, for example, to predict whether a user is likely to submit an order on an e-commerce web site  and to evaluate the effectiveness of banner advertising.

B. Client identification

The Hypertext Transfer Protocol (HTTP) is stateless by design , which means that different page requests are independent of each other. However, in today’s web applications, it is often required to identify a user over many subsequent requests. For example, temporary session information has to be stored for online shops, where users add several products to their shopping carts before finally proceeding to the checkout. This is called session handling.

Identification by IP Address: Every computer on the Internet has an Internet Protocol address, which is used to route data towards it. For various reasons, using this address alone to match a request to a user does not work in todays Internet:

        • A single public IP address
        • IP addresses can be faked
        • Dial-up connections change their IP addresses frequently.
        • A single user may have multiple parallel sessions open

HTTP Cookies: HTTP cookies are arbitrary name-value pairs stored in the web browser. Cookies are commonly used for session handling, storage of site preferences, authentication and the identification of clients. Typically the short form “cookies” refers to HTTP cookies, even though similar mechanisms exist which are referred to by the same name.

HTTP cookies without an expiration date are automatically deleted when the browser is closed. However, expiration dates can be many years into the future.

Adobe Flash Local Shared Objects: Adobe Flash is a popular browser plugin which is mainly used for animated and interactive web content. By default, the Flash browser plugin allows servers to store Adobe Flash Local Shared Objects, also called Flash Cookies, which are similar to HTTP cookies and can be used for the same purposes. However, they are managed by the Flash plugin itself and not by the web browser.

Web storage: Web storage is a specification by the World Wide Web Consortium , which addresses storing and accessing big chunks of data and key-value-pairs in a web browser via client-side scripting.

Web storage is supported by all major browser vendors. The specification includes two JavaScript objects, localStorage and sessionStorage. The former is used for persistent storage of data, while the latter is cleared on browser termination.

Silverlight Isolated Storage: Silverlight Isolated Storage is similar to web storage and can be used to store data locally on the user’s computer, such as key/value pairs and arbitrary files. However, it requires the user to have the Microsoft Silverlight plugin installed.

Google Gears: Gears is an architecture by Google that allows web sites to save data locally such that basic functionality can also be accessed without being connected to the Internet. The user has to give explicit permission for every site that wants to access the store. In Gears, data is not shared between different browsers on the same computer.

*Google now shifting focus from Gears to HTML5.

Hidden Form Fields: The Hypertext Markup Language includes form elements, which are intended to allow users to input data for subsequent transmission to the server via the POST or GET methods.

URL Query Strings:Query strings are pieces of infor- mation appended to the end of URLs, which are sent to the server when the corresponding link is accessed.

HTTP authentication: HTTP natively supports authentication mechanisms, such as Basic access authentication and Digest access authentication. When accessing a web page with authentication turned on, the browser prompts the user for credentials and stores them temporarily. For every subsequent request, these credentials are submitted to the server within the HTTP authorization header, which can be used to identify the session and with it the user.

window.name DOM Property: The Document Object Model of common web browsers includes the property window.name. It is accessible via client-side JavaScript code and can typically store several megabytes of data. Each browser tab has its own window.name property, which is empty just after creation. However, all pages accessed via links in a tab share the same window.name field, meaning that it can be used to exchange information between domains, which poses security and privacy threats. For example, the sequence of visited pages could be stored.

C. User tracking technologies

User tracking has similarities to session handling. Both require the clear identification of a client machine. Hence, techniques from session handling, such as cookies, can also be used for tracking.

Deep packet inspection: Internet service providers have full access to the traffic data of their customers. Deep packet inspection refers to the practice of not only looking at IP packet headers to determine source and destination of a message, but also analyzing the actual payload.

If no encryption is used, this allows ISPs to see exactly what their customers are doing on the Internet, not limited to web browsing activities.

HTTP referrer: The HTTP referrer is a field in the HTTP header which contains the page that the user originated from. Using  HTTP referrers alone, it is not possible to build extended user tracks across arbitrary domains, but often revealing one step back in the browsing history is already more than the user is comfortable with.

For example, if a user enters a web site from a URL identifying a public user page in a social network, the latter can be linked on tracking servers to existing user records, thereby creating a rich profile. It may not be immediately clear that the referring account page really belongs to the visitor himself. However, in some cases the referrer URL is associated with profile editing or similar actions that can only be done by the owner of a profile.

Web bugs and tracking cookies: Cookies are not only useful for session handling, but can also be used for doubtful user tracking practices.

Zombie cookies: While cookies are in principle an effective mechanism to track users, more people are becoming aware of the privacy implications and clear their cookies regularly. While the removal of HTTP and Flash cookies can be achieved without much effort, Zombie cookies, also called Super-cookies, are designed to be resist deletion efforts.

Browser Fingerprinting: The Electronic Frontier Foundation , an US-based civil liberties group, has recently demonstrated the feasibility of a novel approach to browser identification, called browser fingerprinting.

During browser fingerprinting, seemingly insignificant and non-critical configuration and version data is collected from the web browser, for example:

      • The browser’s user agent information ,
      • The client’s screen resolution,
      • The local timezone,
      • The list of installed browser plug-ins,
      • The list of installed system fonts,
      • The operating system,
      • The browser’s language,
      • The list of accepted MIME types.

Source – web-tracking_schmuecker

About these ads

Posted on March 19, 2013, in SEO, Web and tagged , , , , , , . Bookmark the permalink. 1 Comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 53 other followers

%d bloggers like this: