Tuesday, September 29, 2009

Using Alexa to Rough Check GoogleTrends Graphs

Comparing Google Trends to Alexa is not apples-to-oranges, but it's not my idea of good science, either. Google Trends reports daily unique visits while the Alexa data is for percent of available internet audience. They should tend to follow each other pretty closely, however, so let's see how some graph comparisons work out.

Recall this study that showed very close matches between Google Trends and Google Analytics for significant anonymous sites. It showed a very close match -- almost startling -- between the subject's analytics and trends reports.

This was unexpected, because graphs from both sources for major webcomics are showing big disparities. Google Trends shows sharp declines in readership for many comics. Only a few people who create the comics -- and see the private analytics -- have said anything, with one writing to say that Trends disagreed with his analytics and another writing an indignant blog post. I realize many cartoonists truck in public esteem and income, as opposed to data, but I know from an upcoming publication that data has a place.

Google gives you five more months of history than Alexa, so you have to lop off the left side of the Google graph to match them up, right where it says "October." Remember, we're looking to see if the graphs are similar, not so much, or quite different. I'll explain the reason after we view the graphs.

I chose to search for the Alexa 500, then used 100, 200, 300, 400 and 500. It seemed like a random way to pick some big sites, but 500 was mature content, so I picked a nearby one to replace it.

Subject 1: Mozilla.com, Alexa Rank 100


Google Trends:

Subject 2: Torrentz.com, Alexa Rank 200


Google Trends:

Subject 3: In.com, Alexa Rank 300


Google Trends:

Subject 4: so-net.ne.jp, Alexa Rank 400


Google Trends:

Subject 5: Joomla.org, Alexa Rank 496 (used instead of 500, which had adult images)

Google Trends:

Subjective Impressions

(Remember to delete left end of GT graphs, as advised above):

Pair #1: A stretch to much similarity, if any

Pair #2: Somewhat similar but no distinguishing features

Pair #3: Quite similar

Pair #4: Verging on only general similarity

Pair #5: Quite similar

From this small experiment, we see a tendency for Google Trends for assorted large sites to resemble Alexa results from almost resembling to fairly well.

I said earlier we'd be looking to see if the graphs turned out similar, not so much, or quite different. Given that a line graph can follow any track, two graphs that are fairly similar have more in common than chance suggests. The fact that we continue to see a majority of samples where similarity is visible to the naked eyes despite the differences in data expressed by Google Trends and Alexa is a further suggestion that Google Trends information is often valid. The ramifications of this preliminary observation are unclear.

Large sites are one thing; what about webcomics?

I inserted Alexa data and subjective resemblances into the data chart sent earlier by AMRothery. View it: Augmented Webcomics Chart

You'll see that results tend to overlap about half the time, be in adjacent categories most of the rest of the time, and are rarely contradictory. Results are somewhat subjective.

What can we learn? A rough comparison gets us rough results, but if there was no validity it should get near zero results. Alexa and Google Trends have some overlap, but not as much as Google Trends and Google Analytics for sites, especially those outside of webcomics.

Guigar Response

Yes, I did read Brad Guigar's response to this series, and you can too: Guigar, September 25 blog entry. (Scroll down to it.) You'll see that quoting and discussing data from Google is sufficient to nudge the gentleman into accusations of libel, though he'd rather not risk that you read it. Why not just say your data disagrees, and help solve the puzzle?

A clarification about click fraud

Someone suggested I point out the different types of click fraud, since I brought it up for discussion. One type is clicking on ads hosted on your site, so you get more ad sales (and money). That type is particularly likely to rile an ad broker like ADSDAQ.

There is also traffic fraud, which is arranged with inexpensive cheating devices. The more elaborate ones allow one computer to send hit after hit to a wide array of participating websites, dramatically inflating their traffic and obscuring its activity. The skeptical researcher considers this possibility when a comic that has been growing sluggishly at best suddenly has a huge spike in traffic that doesn't recede, especially if there is no major change to explain it.

There are a number of ways that people get caught doing this, and I won't be publicizing them, but not all of them require action by Google: some can be done by anyone with internet access, though without illegally entering accounts, the evidence would be circumstantial -- suitable for ratting them out perhaps, but not for publicly accusing them.

I'll state again, for those with frazzled nerves, that I am not accusing anybody of click fraud.

Next: Google's Response