Monday, September 28, 2009

Google Trends Weirdness: A Click Fraud Hypothesis?

I'm still mulling the weird webcomics data from Google Trends. Here's another hypothesis to consider.

You''ll recall my latest rundown of comic audience size based on data from Google Trends.

I am puzzled that many major titles show big audience declines over the past two years, and wonder what is the cause. Confounding the mystery, I found a small study that looked at sites outside of webcomics. It found that data from Google Trends and data from the same sites' Google Analytics matched closely.

Meanwhile, Google Trends reports famous webcomic sites losing 50% of their readers or more.

I asked, is Google Trends wrong? Or is the data untrustworthy? What can be learned from the phenomenon, or perhaps guessed?

People wrote in with some thoughts, and there was a record amount of kook mail, all from the same Google account but using different names, genders and nationalities. (The kook mail, mostly from one email account, was resolved by eliminating anonymous comments.) And, of course, some people complained about me spotlighting the data, even if it is public.

I like the notion that the most obvious answer is often the most likely. But having waved this post in the air for several days, and not having received answers that lay it to rest for my satisfaction, I wonder if we might finally be seeing something that I have been anticipating for some time.

If you follow the news, you know that click fraud has risen dramatically every year and passed 25% a while back.

Click fraud is Google's Achilles Heel, because they are a pay-per-click advertising company, first and foremost. The problem is so rampant that only a major solution will vanquish it.

Enter Google Trends, in which Google releases public audience data for sites above a certain audience threshold. The graphs are released quietly as a project of Google Labs, but what we aren't told is that this data is Google's click-fraud H-bomb. Google moves in steady, deliberate steps, giving users time to adjust to change. One day perhaps Trends will emerge from Labs as a featured Google offering.

Then, it's only a matter of time before a modest adjustment of the accompanying FAQ adds, "These graphs have been screened for click fraud, with 99% of false results removed." Or something similar.

At that point things might become awkward, because of my tendency to store vast amounts of data for possible future use. That includes redundant storage of graphs going back as long as four years for most major pure-play webcomics. If anyone's site has been on steroids, it will be Google's word against theirs.

Consider: people are either reliably honest or not. Many of the creators that have the worst record for reliability fall into one or more of these types:

  • They're "squishy," meaning they'll contradict themselves to suit their audience, or lie by omission, or refuse to own up to errors
  • They're indifferent: while they may play the role of the charming cartoonist at conventions, many people know them as selfish thugs
  • They're egotists, who commence axe-grinding at the first sign of dissent, and are not gracious to critics
  • They're lazy and therefore sloppy, meaning they are more transparent than they realize
  • They are defensive by nature
  • They often rely on enablers
I notice that when you see one form of deceptive conduct, you tend to see other forms.

Let's be clear. I am not accusing *anyone* of click-fraud. It's Google's job to sort that out.

Corruption is bad for commerce. If there is cheating in webcomics, anyone who is a stakeholder in a webcomic business has a right to raise questions. If there is no cheating, there is no need for concern, and no need for people from the core of webcomics to react with circle-the-wagons hostility.

Yes, I get the same letters over and over: that I am a bitter failure, that I tried the HalfPixel plan and it "didn't work," that I am a conspiracy theorist, etc. If you fall for that, you deserve what you get.

Until the riddle of contradicting data for webcomics audiences is resolved, I think we owe it to ourselves to find out what we can, and discuss each hypothesis with an open mind until it is fully evaluated.

We've covered, to some degree, RSS feed traffic, javascript, cookies, algorithms, Google as a vindictive competitor, changes in user habits, and various other ideas. They're not off the table and may indeed be part of the answer, but I wouldn't say anyone has resolved the question.

A final note: a year ago when I began to notice oddities in analytics from a bunch of closely affiliated webcomics sites, another odd thing happened: ADSDAQ, a major supplier of ads to host sites, abruptly and without comment dropped many webcomics from their portfolio. Perhaps this can be attributed to other causes, but it was notable to actually see most of a category of customers purged without notice.

It would be gratifying to resolve it quickly with an explanation that does not rely on a culture of click fraud as a major component.

Update: Reader AMRothery has been developing a useful spreadsheet of pertinent data, which you can view now.