To Track, Or Not To Track, That Is The Question

If you read any of my other blogs, then you may well be aware that I've recently opened a shopfront on Shapeways to make a few 3D models available for sale. While I really like this feature of Shapeways I wanted to customize the experience a little and so built a seperate, more customizable, shopfront. One of the options when setting up the Shapeways store was to provide a Google Analytics tracking code. I've never used any tracking code before on any of the websites I've built as I've never really seen a strong reason to. Given I was being prompted to though, I thought I'd investigate, especially as it would allow me to see how often people looked at my models, separate to how often people buy them.

Registering for a Google Analytics account is free and easy (as with most Google services) and within a few minutes I had the tracking information to register with Shapeways. Having got this far I decided to also add the tracking information to the separate shop front I'd built, and of course this is where life got interesting.

The first problem I faced was that I couldn't get Google to recognise that I'd added the tracking code to my website. No matter what I tried it simply kept telling me that I hadn't yet added the relevant JavaScript to my pages, yet I knew it was there. After a lot of head scratching and web searching I eventually found the problem and an easy solution. The domain I purchased doesn't actually include the initial www of most web addresses, but the way I have the DNS records and forwarding configured if you type the bare domain name you will be forwarded to the www version automatically. So when completing the Google Analytics form I included the www, which turns out was my mistake. I'm still not sure why this should make a difference (I'm guessing that Google aren't following the forwarding information when checking the page) but anyway removing the www from the default URL field fixed the problem.

The second problem was that I then wanted to turn the tracking off. Not so that I could comply with the EU directive on cookies or anything (I'm taking the applied consent rule, on the basis that if you visit an online shop you have to assume, at the bare minimum, that your page views are being tracked), but so that I wouldn't end up tracking my own page views.

I tend to have two copies of every website I develop running at the same time; one is the publicly visible (hopefully stable) version, while the second is my internal development version. Once I'm happy with updates to the development version I then push them to the live version. So when it comes to tracking page views not only do I not want to track my page views on the live site, but I definitely don't want to track page views on the development version.

My solution to this is to use server side code to only embed the tracking related JavaScript into the page, if it should be used. For this I wrote a utility function to determine if I should be tracking the page view or not:
public static boolean trackPageViews(HttpServletRequest request)
      throws Exception {

   //get the address the request is coming from
   InetAddress client = InetAddress.getByName(request.getRemoteHost());

   //the client is the same machine (an address such as 127.0.0.1)
   if (client.isLoopbackAddress()) return false;

   //the client is within the same site (i.e. not using a world visible IP)
   if (client.isSiteLocalAddress()) return false;

   //the client is at the same world visible IP address as the server
   if (client.toString().equals(getServerIP(false))) return false;

   //track all other page views
   return true;
}
Essentially this code checks the request and says not to track page views if it is coming from a loop back address (i.e. 127.0.0.1 always refers to the local machine, no matter what the machine is), a site local address (i.e. within the same local network and hence using a non-public IP address, often something in the 192.168.*.* range), or if the request is from the same public IP as the server is running on. The first two checks (lines 8 and 11) use standard Java libraries to do the checking, and catch all page views of the development versions of my websites (which are always only accessible from within my local network). The third check is, however, slightly more complex.

The problem is that you're machine itself probably doesn't have a world visible IP address; general a home network will be connected to a broadband router which will have a world visible address. So what we need to do is find the external IP address of the router. There are a number of websites available that will show you the world visible IP from which you are connecting and so we could use one of those to figure out the external address of the server, or we could write our own.
package utils;

import java.io.IOException;
import java.net.InetAddress;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class IPAddressServlet extends HttpServlet
{
   @Override
   public void doGet(HttpServletRequest request,
      HttpServletResponse response) throws IOException {

      InetAddress client = InetAddress.getByName(request.getRemoteHost());
      response.getWriter().println(clientAddress.toString());
   }
}
Assuming you access this servlet via a public IP address then it will simply echo the address back to you. We can now fill out the missing getServerIP method from our trackPageViews method.
private static String myip = null;

private static String getServerIP(boolean refresh)
      throws IOException {

   if (myip == null || refresh) {
      URL url = new URL("http", HOST, "/myip");
      BufferedReader in = new BufferedReader(
         new InputStreamReader(url.openStream()));
      myip = in.readLine();
      in.close();
   }
  
   return myip;
}
I'm assuming this will work in all cases where you connect to the server using it's standard web address, as that will always be converted via a DNS server to a world visible IP address, although if you have a strange network setup where you might have multiple world visible IPs through which you could connect then it might not work correctly. The important point is it works for me, so I can now happily browse the public version and development version of the site from anywhere within my home network safe in the knowledge that my own page views aren't being tracked, so any analytics Google collects for me are form real visitors; albeit only those with JavaScript enabled, but that's an issue for another day.

Before I forget, thanks to GB for the photo I used to brighten up this post!

Generating Sitemaps

A long time ago (okay 2009) on an entirely different blog, I introduced v0.1 of the SitemapGenerator. This was a command line tool that I was running once an hour to monitor my blog, and if anything had changed it would re-generate a sitemap and notify Google of the changes. I wanted this predominantly so that a Google custom search engine would stay up-to-date, something I no longer need to worry about given that I use Postvorta to index all my blogs. I haven't used the code in a number of years but recently I again need to generate a sitemap. This time though I wanted to generate the file from within the web application I was building rather than via a separate recurring script.

I did a quick search for a better version than my original, and while I found a few Java libraries none really met my requirements; I wanted something lightweight that would allow me to take advantage of the extensions to the protocol Google has introduced for documenting images, videos and news articles. So I went back and re-wrote the guts of my original application. The new and improved version now supports a simple API that I can call from within my web application (although I've maintained the command line tool in case anybody was still using it).

If you are interested then you can grab the new version direct from SVN.