Wednesday, January 9, 2013

URL Watermarking



There are times when it's useful to change a URL without changing where it goes to.  I just had to help somebody do this, so I'm capturing some of the nuance here.

The web has been developed iteratively over time evolving new capabilities while maintaining backwards compatibility with older ones.  HTTP, the fundamental way that we push data around the web, is inherently a stateless protocol.  This is great because it allows for easy fault tolerance and scalability but means that sometimes we get into funny situations with dynamically generated content.  Things get especially interesting when taking into consideration things like browser caches, proxies, and access control systems.  

If you’ve ever been designing a system and had to ask a user to refresh their browser to see that they’ve logged out or to get a dynamic update onto their screen, you know what I mean.  Redirecting to a modified URL will cause the browser to refresh stale caches.  This is typically easier to implement than messing with headers which is a more traditional way of doing this.

Another time when this comes in handy is when “digitally watermarking” URLs that you distribute.  By changing the URL without functionally changing where it goes you can gather more accurate metrics about what distribution methods were most effective.  For example if you post the same link on your blog and on twitter and want to see which got the most hits using a service like Bitly.

How to do this depends a lot on how the web server is set up and what kind of things it will or will not tolerate.  Here’s some easy approaches/rules for changing URLs.  I’ll spare the technical details for why this works and just present how to do it.

If there’s no question mark in the URL, then add a question mark to the URL and then a number (any number, this is your watermark).

www.abc.com/something.html?12345


 If there is already a question mark in the link then add an ampersand to the end of the URL and then your number.

www.abc.com/something.html?alreadyhere=something&12345


If both of these methods result in an error, try using a pound sign instead.

www.abc.com/something.html#12345

All of these should go to the root URL, but provide a unique watermark in your logs which you can look at to see how the person found your link.  Were they forwarded that email you sent it out in or did they click on the link from your blog?  Seeding these innocuous numbers will allow you to figure it out.