How Much Data do You Need to Reach Statistical Significance?
One of the hardest things to do when running split tests is to understand when you have results that are significant enough to signal it’s time to stop testing and move on to the next step.
I was recently running a very thorough split test that examined search vs content data. I took some of the ads that were running throughout that time and put them into several split tester calculators to see if I agreed with any of them.
You will quickly notice a downfall in the tools; but don’t despair. It comes down to the data. Do you really have enough data to compute statistical significance?
My suggestions are at the bottom of this article.
SplitTester.com
Splittester.com uses a combination of clicks and click through rate to determine significance. While it can be useful to see which ad will get the higher click through rate (CTR); most of us are much more interested in profit. However, if you were looking for traffic – CTR is a useful metric.
Just a couple hours into the campaign, AdWords refreshed the data. I took the initial data and put it into splittester.com. Below are the results:
Only 13 clicks and it seems I have statistical significance. I hope everyone of you just sat up and yelled that wasn’t enough clicks. You’re right.
By day 2, the results were exactly the opposite. Ad copy 2 was now statistically more relevant.
I was actually testing several more ads than 2, but you can’t compare several different ad copies at this site. In addition, there’s no mention of conversions, just clicks on the first test site. If traffic is your goal, and you have the minimum amount of data necessary (see below), this could be useful.
SEO Book’s Calculator
Next, I went over to Aaron Wall’s calculator. SEObook has a very nice suite of tools which are free to use. One of them is a PPC G-test calculator. The very first thing you’ll notice is that SEObook’s calculator allows you to test several variations.
The very first thing I noticed about Aaron’s tool is that it includes a warning that my data points are too low. Thank you, Aaron. However, the ad which has a 97.64% confidence that it should be dropped is the eventual winning ad.
I do like that Aaron’s calculator is based upon successes. You can use this tool to calculate ad tests, landing page tests, or even combinations of the two. It’s quite useful when you have enough data.
Not going to bore you
I tested out many more tools, and almost all of them gave me similar results.
When do you have enough data to utilize the calculators?
It’s not that these are bad tools. When used correctly, they are both useful tools.
The issue is that that you need to understand when you have enough data to actually believe your results.
Segment search vs content data. Do not combine these two mediums into one set of test results.
Below are my ‘rules of thumb’ to determine if I have enough data to even move onto calculators:
Time:
- Minimum: 1 week. Each day has different characteristics. Allow those variances to run over a week’s time. (One week case study)
- Better: 1 month. Each week has variances (especially payday weeks). Allow that to play over a month’s time. (Common to see variances in luxury goods)
- Ideal: 3 buying cycles (with a minimum of one month)
Traffic:
- Minimum 300 clicks per ad (and I still think this is too low)
- Better: 500 clicks per ad
- Ideally: 1000 clicks per ad
Conversions:
- Minimum: 7 conversions per ad
- Ideally: 15+ conversions per ad
Temperance:
If you are running 10,000 clicks a day, you might want more data as results can change over the buying cycle.
If you are receiving 1000 clicks a month, you might need to weigh when you can make a decision vs the data you have.
You will have to weight how much data you receive vs making decisions.
Ideally, you’d want to reach every milestone (1000 clicks, 15 conversions, and 3 buying cycles with a minimum of one month) before making decisions.
Of course, use common sense. If you’ve been testing for a month and one ad has 30 conversions and the other has 4 (assuming they have a similar amount of clicks), you can make assumptions.
The golden rule of optimization?
It is more important to believe the data then to complete a test.
Do not make decisions based upon insufficient data – all you will do is hurt your business
Conclusion
Online tools and excel calculators are not inherently bad – they’re just doing some math.
As a marketer, your job is to not only run tests – but to ensure that you have the proper amount of data before using such tools to complete a test. As always, you want to get the actionable analysis stage. That is the goal – to run a test to determine which marketing message promote your business’s goals so everyone succeeds.
You can’t do that with insufficient data.
Sometimes you have to be patient.
Really Great Post! I use split-tester and always get frustrated when I want to compare more than one ad. I do use it for conversions as well. I just switch number of clicks = number of conversions and then CTR – to conv. rate. Yet, I always thought something was off when it says 3 conversions vs. 0 conversions is stat. sign. This post has some really great guidelines I plan on following immediately. Also the PPC G-Test does look to be much better overall than split tester. Thanks for sharing!
Bonnie Schwartz
One issue about which I can find almost no discussion is how to assess significance when split testing on profit, as opposed to CTR or conversion rate. I have found that in some instances, CTR and conversion rates are poor predictors of profit, and so it’s absolutely essential that I split test on profit in such cases. If you or anyone reading this knows of resources addressing this issue, please post a link!
When split testing for profit, I use Profit per Impression (search) or Profit per Click (content) as my metrics.
Information on Profit by Impression/Click :
https://bgtheory.com/blog/profit-by-impression-the-real-metric-in-ppc-testing/