Monday, December 22, 2008

Web: Robust Click-through Tracking

I have a web service that provides recommendations. I want to know when people click on the links. The site showing the links (imagine a book store) is separate from my web service.

Let's imagine a situation. My server generates some recommendations. The site shows those recommendations. After 10 minutes, my server goes down because both of my datacenters go down. I want to know if the user clicks on a link, but if my server is down, that must not block the user from surfing to that link.

I see how Google does click-through tracking. It's simple, non-obtrusive, and effective. However, as far as I can tell, it requires the server to be up. Well, they're Google ;) It's different when you're a simple web service that must never ever cause the customer's site to stop working.

I came up with the following:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

<html>
<head>
<title>Click-through Tracking</title>
<script type="text/javascript">
function click(elem) {
(new Image()).src = 'http://localhost:5000/api/beacon';
return true;
}
</script>
</head>

<body>
<p>
<a href="http://www.google.com"
onclick="return click(this);">click me!</a>
</p>
</body>
</html>
Note a few things. It doesn't mess with the href. It works whether or not the third-party server (localhost) is up. It does talk to a third-party server, but it does so using an image request; hence, the normal cross-site JavaScript constraints aren't imposed. It has all the qualities I want, and I actually think it's a pretty clever trick. However, I'm worried.

I like the fact that loading an image is asynchronous. I'm depending on that. However, what if it takes the browser 1 second to connect to my server, and only 0.1 seconds to move on to Google (because that's what the link links to). It's a race condition. As long as the browser makes the request at all, I'm fine. However, if it gives up on the request because DNS takes too long, I'm hosed.

Does anyone have any idea how the browsers will behave? Do my requirements make sense? Is there an easier way?

15 comments:

Jeff said...

You can use the image's onload property, but that does mean setting the browser's location.

Shannon -jj Behrens said...

What if the image never loads because the server is down?

bretthoerner said...

Is it OK to lose a few clicks here and there?

Basically, the way Reddit does it is very neat and built to be super-fast for the client (no change in user experience). But you will "lose" a click if a user clicks-through to another site and never comes back to yours.

http://www.reddit.com/r/programming/comments/66jmp/dear_reddit_devs_you_guys_are_brilliant_thanks/c02zsjw

bretthoerner said...

Hmm, no autolink on Blogger, lame.

How reddit tracks clicks.

Shannon -jj Behrens said...

That's a great tip, thanks.

Unfortunately, it won't work in my particular case. Remember, we're a third-party recommendation server, so we don't see the user's cookies. We'd have to instrument code on all of our customers servers to get access to those cookies, which would be a nightmare.

I definitely think this trick is worth keeping in mind, though.

max said...

Here on developers.org.ua we're using Google Analytics to track the clicks for us.

See http://www.developers.org.ua/static/js/ga-ext.js

Shannon -jj Behrens said...

Thanks for the tip.

Shannon -jj Behrens said...

Hmm, I think this problem suffers from the Heisenberg uncertainty principle ;)

I can have really reliable tracking or the ability for my tracking server to go down, but it's really hard to have both. I wish I could tell the Web browser, "Hey, I just set img.src. I know you're trying to go to another page. That's cool and all, but can you finish loading img.src too?"

Shannon -jj Behrens said...

(Which is to say, I think my solution actually does suffer from the race condition that I hypothesized. It works sometimes, but sometimes the browser doesn't actually bother downloading the image.)

Shannon -jj Behrens said...

Heh, prior art: http://www.webmasterworld.com/forum91/2420.htm

Shannon -jj Behrens said...

Heh, I came up with a solution :-D

Here's a version of the click function that will ping the server if the server is up, but functions correctly if the server is not up:

function click(elem) {
var href = elem.href; // Avoid memory leaks.
function go(event) {
location.href = href;
};
var img = new Image();
img.addEventListener('load', go, true);
img.addEventListener('error', go, true);
img.src = 'http://localhost:5000/api/beacon';
return false;
}

Shannon -jj Behrens said...

This uses syntax that works on IE and hopefully avoids memory leaks. I can't use a JavaScript framework because I'm a third-party service that must keep a really small footprint.

function click(elem) {
var href = elem.href;
function go(event) {
location.href = href;
};
var img = new Image();
img.onload = go;
img.onerror = go;
img.src = 'http://localhost:5000/api/beacon';
img = null; // Avoid memory leaks.
elem = null;
return false;
}

Shannon -jj Behrens said...

It works :-D

Jon Scott Stevens said...

I know you aren't doing it here, but be careful with new Image() on IE and appending that image into the DOM.

http://support.microsoft.com/default.aspx/kb/927917

http://clientside.cnet.com/code-snippets/manipulating-the-dom/ie-and-operation-aborted/

Shannon -jj Behrens said...

Jon, good tip. Thanks!