Data is only useful when it’s accurate. With all of the spam showing up in Google Analytics (GA), website administrators and marketers looking to keep their data trimmed and useful are fighting an uphill battle. While you could just create an army of filters to fight back, this adds yet another layer of complexity to an already complex tool.
So what can you do? The answers are actually simple. We’ll break down a simple and effective filter that is great at eliminating one of the most common forms of spam in GA—ghost spam.
Before we get started, it’s important to make sure you know how the spam is finding its way into your data in the first place. We’ll briefly cover the two most common types of referral spam. For an in-depth look to referral spam, be sure to check out this recent guide from Jared Gardner.
Ghosts? Crawlers? What’s the Difference?
While all spam is bad, we’ll be targeting ghost spam in this article. Ghost spam comprises the bulk of GA spam. It earned its name due to the fact that the hits registered actually never make it to your site. We’ll be using this crucial distinction to our benefit in a minute.
You might wonder how the spam makes its way into your analytics at all if they never visit your site. The answer is Google’s Measurement Protocol. While it was intended to allow greater tracking of analytics, spammers have found a way to turn it into the auto-dialer of the GA spam world. They generate a list of randomized tracking codes and start sending data.
Crawler spam is more straightforward. Spam bot crawlers ignore the rules of your robots.txt file and index whatever they wish. This act leaves a record in your GA reports that you notice as what appears to be a website visit.
The fact that crawlers typically target specific sites, instead of using a randomized pattern like ghost spam, might make them difficult to identify. Fortunately, major crawling spam bots don’t seem to appear too often. If you think there is something odd in your analytics, search the referral on Google or cross-check it with a good referrer spam list.
Exorcising Your GA Spam Problem
Typically, the first impulse to deal with a GA spam problem is to include the referral in one of your exclusion filters. While it might be a simple, quick step, it’s not the best way to deal with ghost spam. The reasons are plentiful:
- Updating your filters each time a new spam bot pops up takes time. This is multiplied if you’re maintaining multiple sites.
- This does nothing to keep the spam data, such as spikes in traffic, out of your analytics until you identify the referral yourself.
- If the spammer uses direct visits, you won’t catch them with this filter.
The best way to create an effective filter? Using the referrals hostnames.
The random nature of most ghost spam operations means that they often don’t bother setting a legitimate host name. When they do, it’s often fake.
By comparing the referral with the hostname, you can weed out the fakes easily.
Valid traffic is easy to spot. The hostname and referral typically match. If not, it should be from one of the places you’ve used a GA tracking code—such as sponsored posts, services and translated sites.
Now that you know what to look for, make a filter using only hits with real hostnames. This will avoid all of the ghost spam—regardless of the way they access the site or enter your data.
The first thing you’ll need is a report of the hostnames in your analytics. You can make one right inside the GA dashboard.
- Click the Reporting
- Click Audience on the left side of the screen
- Click Technology to expand the drop-down and choose Network
- Click Hostname at the top of the report
You should now see a listing of every hostname in your records. You now want to go through the list and make note of all of the valid hostnames. Be sure to include subdomains (i.e. mysite.com, blog.mysite.com, jp.mysite.com, legitothersite.com, sponsoredpostsite.com, etc).
For smaller sites, this will usually include a main domain and possibly a handful of subdomains. Once you know you have noted them all, you’ll want to create a regular expression like this:
Don’t worry about the subdomains in this step. The regular expression should generate matches for those as well.
Now you’ll want to head to a view with no filters active. Once there:
1. Create a Custom Filter.
2. Choose the radio button for Include
3. Select Hostname from the Filter Field
5. Click Verify to ensure that your filter is set properly.
6. Save your filter.
Now, all you need to do to remove ghost spam from any view in you GA dashboard is apply the filter. Using one filter, you’ve automatically eliminated any ghost spam from invalid hostnames. You won’t need to bother updating multiple filters or going through and adding exclusions every week.
Just be sure that if you use your tracking code anywhere that you add the domain at the end of the filter.
Looking for clean views of your past data as well? Just use your regular expression in an Advance Segment.
Want More Information on Managing Spam?
With all the headaches that spam causes, you’ll find endless resources out there on the web. Three articles we recommend are:
- Keep Calm and Stop Google Analytics Spam by oHow
- Blocking Ghost Referrals from Google Analytics by Cucumber
- Removing Referral Spam from Google Analytics by Viget
Have any questions or thoughts on this common issue? Leave them in the comments below.