Be “Do Not Track” compliant in 30 microseconds or less.

Fork me on GitHub

Last week I blogged about the state of Do Not Track on the top internet properties, advertisers and widget providers. If you haven’t read it yet, **spoiler alert** the results aren’t encouraging.

In my experience, many of the top web operators are in fact concerned with your privacy, so it might be hard to understand why even they aren’t honoring your “Do Not Track” settings. I’d venture that part of that is definitely awareness, but also because, at scale, implementing “Do Not Track” compliant solutions isn’t a trivial matter. The latter becomes obvious when you see people in the Web & Ad industry talking about the importance of DNT, with only a handful being able to actually provide a working implementation.

Do Not Track – the basics

Let me illustrate the difference between DNT and non-DNT requests by using Krux (my employer) services as an example.

Under normal circumstances, a basic request/response looks something like this, where a user generates some sort of analytics data, and is given a cookie on his first visit. The data is then logged somewhere and processed:

< HTTP/1.1 204 No Content
< Set-Cookie: uid=YYYYYY; expires=zz.zz.zz

Now, for a DNT enabled request, the exchange looks a bit different; the user still generates some sort of analytics data, but in the response cookie the value is now set to ‘DNT’ (we set the value to ‘DNT’ because you can’t read the value of the DNT header in JavaScript in all browsers yet) and the expiry is set to a fixed date in the future, so it’s impossible to distinguish one user from another based on any of the properties:

> DNT: 1
< HTTP/1.1 204 No Content
< Set-Cookie: uid=DNT; expires=Fri, 01-Jan-38 00:00:00 GMT

Implementing Do Not Track compliance

At Krux, we provide audience analytics for top web properties, with a very strong commitment to user privacy. As part of that, we take honoring “Do Not Track” for our properties, as well as our publishers’ properties, very serious.

Our analytics servers process billions of data points per day (or many thousands per second), and each of these requests should be handled quickly; any meaningful slowdown would mean a deteriorated user experience and provisioning many more machines to handle the traffic.

The industry standard for setting user cookies is basically Apache + mod_usertrack and in our production environment will get response times in the 300-500 microsecond range. This gives us a good performance baseline to go off. Unfortunately, mod_usertrack isn’t DNT compliant (it will set a cookie regardless) and can’t be configured to behave differently, so I had to look for a different solution.

Writing the beacon as a service is a simple programming task, and the obvious first choice is to try a high throughput event driven system like Tornado or Node (both are technologies that are already in our stack). I encountered 3 issues with this approach that made this type of solution not viable however:

  • Tornado & node both respond in a 3-5 millisecond window, and although that’s quite fast individually, it’s an order of magnitude slower than doing it inside Apache
  • Response times degrade rapidly at higher concurrency rates, which are a very common pattern in our setup
  • These processes are single thread, meaning they need to be behind a load balancer or proxy to take advantage of multiple cores, further increasing response times

Next, I tried using Varnish 2.1. It handled the concurrency fine, and was responding in the 2 millisecond range. It also has the benefit of being able to be exposed directly on port 80 to the world, rather than being load balanced. The problem I ran into is that Varnish does not allow access to all the HTTP Request headers for logging purposesVarnish 3.0 does have support for all headers, but can’t read cookie values directly and we’ve experienced some stability problems in other tests.

With none of these solutions being satisfactory, nor coming close to the desired response times, the only other option left was to write a custom Apache module to handle DNT compliance myself. And being not much of a C programmer (my first language is Perl), this was a fun challenge. It also gave me a chance to write mod_usertrack in the way it should have been behaving all along.

Introducing mod_cookietrack

So here it is, mod_cookietrack, a drop in replacement for mod_usertrack, that addresses many outstanding issues with mod_usertrack (details below), including Do Not Track compliance.

And most importantly, it performs quite well. Below is a graph that shows performance of an Apache Benchmark (ab) test using 100,000 requests, 50 concurrent to a standard Apache server. The blue line shows mod_usertrack in action, while the red line shows mod_cookietrack:

The graph shows that for all the extra features below, including DNT compliance, it only takes an additional 5-10 microseconds per request

We have been using mod_cookietrack in production for quite some time now, and it is serving billions of requests per day. To give you some numbers from a real world traffic pattern, we find that in our production environment, with more extensive logging, much higher concurrency and all other constraints that come with it, our mean response time has only gone up from 306 microseconds to 335 microseconds.

So what else can mod_cookietrack do? Here’s a list of the improvements you get over mod_usertrack for free with your DNT compliance:

  • Rolling Expires: Rather than losing the user’s token after the cookie expires, mod_cookietrack updates the Expires value on every visit
  • Set cookie on incoming request: mod_cookietrack also sets the cookie on the incoming request, so you can correlate the user to their first visit, completely transparent to your application.
  • Support for X-Forwarded-For: mod_cookietrack understands your services may be behind a load balancer, and will honor XFF or any other header you tell it to.
  • Support for non-2XX response codes: mod_cookietrack will also set cookies for users when you redirect them, like you might with a URL shortener.
  • Support for setting an incoming & outgoing header: mod_cookietrack understands you might want to log or hash on the user id, and can set a header to be used by your application.
  • External UID generation library: mod_cookietrack lets you specify your own library for generating the UID, in case you want something fancier than ‘$ip.$timestamp’.
  • Completely configurable DNT support: Do Not Track compliant out of the box, mod_cookietrack lets you configure every aspect of DNT.

The code is freely available, Open Source, well documented and comes with extensive tests, so I encourage you to try it outcontribute features and report issues.

For now, mod_cookietrack only supports Apache, but with that we’re covering two thirds of the server market share as measured by Netcraft. Of course, If you’d like to contribute versions for say, Nginx or Varnish, I’d welcome your work!

Now that being DNT compliant will cost you no more than 30 microseconds per request, all you good eggs have the tools you need to be good internet citizens and respect the privacy of your users; the next steps are up to you!

Note: these tests were done on a c1.medium in AWS - your mileage may vary.


About these ads

2 thoughts on “Be “Do Not Track” compliant in 30 microseconds or less.

  1. Hi there I am so happy I found your web site,
    I really found you by mistake, while I was searching
    on Yahoo for something else, Anyhow I am here now and would
    just like to say thank you for a tremendous post and a all round exciting
    blog (I also love the theme/design), I don’t have time to go through it all at the moment but I have saved it and also added in your RSS feeds, so when I have time I will be back to read much more, Please do keep up the awesome work.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s