Thursday, February 5, 2009

Fake Data and Remarkable Patterns in Selected Webcomics


People who cheat when the stakes are low often cheat when the stakes are high. People who cheat in an organized fashion with friends show a higher likelihood of cheating with friends in other settings. These observation have been a staple of criminology for decades.

Here, we examine the relationship between cheating on Twitter, and suggestive evidence in other locations.

Twitter borders on being frivolous, yet is a sensation, growing rapidly. People trade short bursts of communication with friends, make new friends, and keep up with doings of people they care about with minimal time investment.

We can't say that cheating on Twitter is of no concern to other people, despite its lightweight aspects. This may cause skimming readers to leap from chairs and accuse us of making something out of nothing. They may leap, but they will be undone if they do not read carefully.

Among the earliest users of Twitter have been web cartoonists. Why, is difficult to say, but it is a fresh way to interact compared to the long-in-the-tooth sites in Webcomicland. I was an early user myself.

There are things we can observe from people who cheat on Twitter (that cheating will be explained in a moment).

"Followers" on Twitter is a status symbol to many. It is a very public, but unquestioned, way to exhibit your popularity, especially among people who follow those they perceive as popular. For people who feel unrecognized, a portfolio of followers is an ego balm. The prime motive of Twitter cheating is to artificially drive up follower count and gain status.

Cheaters on Twitter appear likely to cheat elsewhere. Like an unwanted criminal record, it can confirm or deny impressions about others. A fuller understanding of a person seen behaving unscrupulously can be gained by examining their behavior on Twitter.

From Twitter, the obvious place to search is web site analytics of the person's comic. This is the data that measures traffic to a comic. Some creators post their data or provide access to it directly, including T Campbell and Jennie Breeden, but most consider it private business data and are content to let estimates leak out. A few let out a cock-a-doodle-doo on Twitter, announcing data without offering proof.

Establishing an ironclad connection between Twitter cheating and web site analytics cheating requires more analytics monitoring devices that we currently have. With some assigned to other projects, we decided to use Compete, a source for web site analytics available at This allows you to participate.

Some of the data produced is remarkable. We are not, however, interested in sensationalism, but on the behavior of notable figures in the webcomic online scene. Data will be presented pertaining to specific people. We will offer some assistance in interpretation, and leave the reader to draw independent conclusions. Fresh perspectives on analysis are welcome.

If Twitter is the hyping of the creator, false analytics are the hyping of their work. They make trend analyses difficult, mislead the media, deceive fans and make it maddening to develop sound webcomic business models. (It's interesting to note that pre-Twitter, an unscholarly article listed creators of webcomics who earn a living from their work. This is a good example of falsified data inhibiting the creation of data of interest to all comics creators, as it is little more than a mix of claimants and frauds with citations that lead nowhere of value.)

I. Definitions

Fake Followers on Twitter

There are two types:

A fake profile is created, and the creator's real profile becomes the recipient of a "follow" from the fake, adding to their apparent popularity. A tally of followers appears on all user profiles, and is considered a status symbol by some. It is a way to boost social standing based on the perceived approval of others.

A second type of fake is when someone creates a fake follower not just for themselves, but includes friends, having the fake follow them as well. This has evolved to where "faking gangs" work together to add fake followers to one another's lists each day. Activity of this sort is one metric we considered when focusing on prolific cheaters and choosing which to examine first.

"Spam profiles," of the "Lose Weight Now" variety, are not important to this discussion.

Fake Traffic

Numerous "cheatware" programs, scripts and devices exist to send false traffic to a website, thus inflating its ranking, the money it can command from ads, and its news value. It also increases the perceived prominence of the site owner, who may attempt to parlay their status socially or promote their merchandise. While Google successfully catches some people and deletes them from its records, it will probably require completion of the next wave of detection programs before cheaters are detected rapidly and consistently.

II. Methodology

For this analysis, we located one Twitter "faking gang" after a core member drew attention by making astonishing public claims.

The members of the group:

Scott Kurtz, creator of PvP
Meredith Gran, creator of Octopus Pie
Kris Straub, creator of several comics, and a member of HalfPixel Collective with Kurtz
Jeph Jacques, creator of Questionable Content
David Malki, creator of Wondermark

It receives support from additional participants, but this list is sufficient for understanding.

III. Investigation

We decided to see if prolific creators of fake followers on Twitter show evidence of unusual traffic activity on their web sites.

We had to determine the number of fake Twitter profiles being manufactured. Observance of public communication places Gran at the hub of interactions, so her own followers were audited in total.

A word on recognizing fake followers. An elaborate fake character is harder to recognize, but also time-consuming to create. The process of spotting fakes begins with a complete review of a person's followers, and includes various techniques acquired from experience. If someone is bold enough to argue that the fake phenomenon is less than we report (especially since it is greater than we report) our suggestion is that you master the technique before placing charges.

One technique, used heavily by Gran, exploits the fact that friend Jeph Jacques maintains profiles for some of his comic characters. This is not uncommon, nor frowned upon. Because of the number of cartooning-related characters at Jacques profile, it makes it tempting to use his characters repeatedly. An obvious marker of a possible Gran or Jacques fake is reliance on these character profiles to flesh out the fake profile.

A stop at Jacques' profile yields about ten followers for a fake in just one stop. Therefore, one marker of a Gran fake, and also Jacques fakes, is the presence of numerous Jacques characters in the fake's profile.

Followers installed in a fake
profile by Jeph Jacques. Top
row: his own avatar is fifth
from left. No. 6 in the second
row and 1,2,5 and 6 in the
third are characters from
his "Questionable Content."
Not a problem, but they
mark his fakes frequently.

Various techniques allow us to identify fakes; too many to list today.

There is always the possibility of errors. Among our list of "definite" fakes there may lie at least one mistake. Conclusively identifying fakes is best done with intersecting evidence.

Once we established a core group of Twitter fakers (leaving others for another day), we reviewed their website traffic reports.

Be cautioned: Compete only works well for sites starting in the 40-60,000 visitor range/month. The site does signal whether data is sparse, leading to approximations, or robust, leading to confident declarations. Nonetheless, there is evidence that its totals for webcomics can run low, but that its trends over time are reliable. Don't be startled if you enter a site known to have 18,000 visitors per month and it offers you results of 5200, flagged as a guess. Smaller comics do not yet rank accurately.

Don't be surprised if fake followers, including those we've linked today, start vanishing. You can't easily get rid of them, but you can block them. This exchanges the fakes with even more obvious "blocked" reports, so there appears to be no easy way out other than starting a brand new profile. It does move the follower count closer to reality, however.

IV. Findings

The graphs below use whatever increments on the vertical axis best fit the curve. It they all used the same increments, a mild spike like the one below could appear dramatic. Investigators must use a trained eye to spot peculiarities. Data accompanying the Compete graphs is useful.

PvP's November 08 jump in unique visitors doesn't look like much. Here, measured as unique visitors, it's a 52% jump, representing about 26,000 new for the month. Note that increases in the eleven months preceding were typically about 4,000 per. In the next graph of PvP's total monthly visitors for the same period, there is increased volatility of the "hump-spike" type. We theorize that such patterns may follow a person's first experimentation with fiddling their data, followed by fear and drawing back, then greed.

Below, PvP again, this time counting visits instead of visitors.

Before we examine graphs for the other titles, let's review where Compete gets its data. Many people are under the impression that all their data comes from browsers equipped with the Compete Toolbar. If so, this data would be invalid, as fake visitors don't bear toolbars. In fact, Compete uses a range of data sources which are normalized and integrated. Still not perfect, but it has the advantage that readers can check their own results and compare various Twitter fiddlers with their site data.

The next graph is for Octopus Pie's monthly visits. During July '09, it rose from 10,000 to 40,000, then leveled off at that level. The year ends with the comic up 558%. I'm interested in reports of anyone showing similar numbers.

Compete is still listing Octopus Pie as a "sparse data" title, which explains the volatile numbers during the first half of last year. Whether it were a hedge fund, a stock market average or a ten year change in precipitation, this graph would be considered remarkable.

The variety of traffic manipulation scripts and similar black hat gadgets makes it difficult to trace patterns to any one technique, especially if there is an alternate explanation at work. Obviously, suspicion will be quelled upon a convincing alternate explanation being offered. General trust will be harder to restore.

The author of Octopus Pie frequently reports traffic data on Twitter. The data offered is page views, which is among a site's highest numbers and sounds more exciting. It also makes it difficult to reconcile claims for the site with actual data. When challenged, a typical response has been, "The server says..." I have found that errors are common among people using server end analytics, as opposed to Google Analytics. Data interpretation errors are very common, and incorrect installation of the analytics is also common. No matter how experienced one is, it's probably a good idea to use a second set of analytics to verify server reports. Octopus Pie, does, in fact, have Google Analytics installed. Data from that program has not been publicly offered to confirm the server end data. If it disagrees, public claims are meaningless. If it agrees, then it seems reasonable to ask, What is the cause of these astonishing traffic patterns?
The latest month was just added, so I am adding one graph, and I am making it smaller so it doesn't bleed off the page. I'll have to see about fixing that, since it removes the essential data from view for some viewers.

Up another 40,000 in one month. Wow.
After looking at this one, I got a new theory, partly from fluffy, who is always pushing me to be smarter than I am. Instead of the black hat traffic device, which still remains a compelling theory, what if simply horsing your way to the top of the Dumbrella/HalfPixel popularity contest is sufficient to drive sustained spikes of public interest? That doesn't speak too well of average comic reader IQ, but it's an idea I'll be thinking about.
(By the way, readers please use care not to put words in my mouth. It's a long post, I know, and I am interested in every comment, but several people thought I am accusing people of cheating the numbers, when what I have accused them of is rampant cheating on Twitter. I have also suggested that explanations for odd traffic events shared by the Twitter cheaters might include gaming numbers. And they might not. I hope to find out. Remember: if we don't strive for accuracy, we're no better than them.
I don't know which is more likely, but from an ease of flagging poisonous cliques perspective, I'd rather it be linked to shallow social climbing than high tech fraud.

Below are monthly visits for David Malki's Wondermark. This jump of 235,000 readers is notable. It's the first time I've observed a comic going from 50,000 to 235,000 in less than a year, especially without major prizes, fanfare or buzz. This is a 142% jump for the month, and a 509% annual rise. At this point, you might be wondering what normal graphs look like. Since this is going to be a big post already, I refer you to Compete, or just look down. Everything before November is pretty normal.

Kris Straub's Chainsaw Suit is too small and new to provide meaningful data. There's no sense including the graph yet.

Jeph Jacque's Questionable Content is below. At first glance, this line is reassuring, but recall what I mentioned earlier about the y-axis increments altering the subjective picture. Something is happening here.

These are big y-axis increments: 200,000 each. The year starts out fine, with impressive, but not suspicious growth. July stands out for the addition of 150,000 visitors. In October and November, about 350,000 visitors are added. This is dazzling growth for a comic that started the year at 420,000.

The comic has added 718,000 visits for the years, a growth rate of 152%. It's now over 1.1 million, if the data is true. Study the graph carefully, this is one of those examples where the design of the graph make the data seem tame.

V. Selected Links

We've included a few specimen links to fakes. The actual link archive is obviously larger. Creating fakes is often a daily activity. Some may be altered by the time you reach them. - Gran (currently the face on the coral pink background), and includes Jacques. Would be inconclusive about who made it if not for various Gran patterns. - Gran. Includes Straub, Kurtz. - Gran, obviously. - Gran, plus HalfPixel's Kurtz, Straub and Guigar. The other two are filler. - Gran, with regulars including Renee Engstrom of Anders Loves Maria, Straub and the less common Chris Crosby of Keenspot. He probably has no idea he is included. - Likely David Malki creation. That's him on the right. - Fakes rarely feature the author first, so this is probably Malki, not Jacques. Note the reliance on Jacques characters. - Obvious fake. Best guess is Straub. The close relationship among the HalfPixel guys suggests Kurtz's insertion is approved. Jeph Jacques Jacques. This one only follows others of his own creation. - All Jacques creations, and dependable old Wil Wheaton. You could probably get rich selling Wheaton t-shirts - many webcomic artists would buy five.  - Another.  - New ones are appearing on Jacques' profile as we write. This one nabs Gran and Malki and also drags in Ryan North and R Stevens.

A few Jacques links suddenly go to "pages that do not exist." Since he appears to be adding at this moment, I can only speculate that Twitter is culling. Ignored in the past, fakes have become a priority area for Twitter. - Kurtz is either more cautious or less considerate. His fakes rarely include his friends. - Kurtz. The use of bland, unlikely names is a Kurtz trademark. - Kurtz. He shares his friends' fascination with minor celebrity Wil Wheaton. - Kurtz. His avatar is on the right. - Kurtz.

These are not the funniest, the most unsettling, or even guaranteed, as an error could slip in despite precautions. Use them to get a feel for what to look for. To view anyone's followers, click on anyone's profile, substitute the new name (eg pvponline, punkybird (Gran), Kris Straub, etc.) in the URL. Then click followers, under the bio, and work your way through.

V. Conclusions

1. Enhancing personal profiles with fake followers is rampant among webcomics creators on Twitter. It's also risky: Twitter has told us it is a "priority."

2. The "status symbol" of large followings, combined with fakery, is creating engineered celebrities.

3. A network of webcomickers exists to create and enlarge fake follower counts for one another.

4. An examination of prominent Twitter ringleaders shows unlikely increases in site traffic. We speculate that it may be generated with "black hat" devices currently on the market, and that it may be done from a single source, boosting traffic for many collaborators at once. There may be other possible explanations.

5. The pairing of suspicious traffic patterns and suspicious public relations (Twitter) patterns is frequently compelling, suggesting a systematic approach toward artificially attracting readers.

6. Widespread corruption of webcomic metrics may be a contributor to the recent withdrawal of many online advertising agencies from servicing webcomics, but if not, it promises the potential of webcomics being blacklisted from future opportunities. Webcomics are already specifically rejected by name at some firms.

7. Falsified analytics are destructive to the uninformed, causing them to invest time and money in the public success model, whilet others are functioning under a secret, and dishonest, model.

8. May people considered unlikely to engage in cheating have fake followers on their profiles, inserted without their full attention by friends. While some must know what is happening, there are probably people who have no idea of their involvement, and would be distraught to find out.

9. The trend toward cheating raises questions about the potential of comics to support many of their creators. While successes are well known and documented, it seems that many people prefer a model that doesn't require much work.