If you’re A/B testing, you might’ve had the questions if you (or your company) should be testing during high-traffic/transaction periods (aka peak seasons like Christmas).
Let’s see if I can give you some arguments to convince your HIPPO otherwise :)
Short background on my testing situation/experience: I lead the optimization team for an international gifting company (euroflorist.com, we mainly sell flowers). Our business is very much holiday-driven with some mayor peaks (Christmas, Valentines, Mothersday etc.) and many small peaks during almost every week of the year in one of the countries.
“Changing things is not good when you have a lot of people exposed to those changes. More people = more risk!”
Before starting any test, you should havecalculated the required sample size. Accommodate for the extra peak traffic (look at last year), scale down the % of people you’re testing on and the risk you take will basically be the same as with any other test.
Example: if you have a test capacity of 1,000,000 users in a month, but only need 500,000 for your A/B test, you can set the test to run for only 50% of the users during the test period (we usually do 4 weeks). Do you have twice the amount of traffic during peak season? Then you only need 25% of your traffic and you can run 3 extra similar tests.
In this case, if the test is negative, you’ve only shown the negative variant to the exact number of people needed to prove the hypothesis and not more. Or, if the effect size is more negative than expected, the effect spreads out, allowing you time to spot this and stop the test if needed.
It’s 2016. Websites (especially webshops) change continuously: products go out-of-stock, promotions expire or new ones start, newsletters campaigns are being sent, Adwords campaigns are disabled or boosted, texts are changed…. right up to and even during the peak.
If change is bad, all these changes should be stopped, which is probably not going to happen. Just because you don’t measure the impact of all changes outside of A/B testing doesn’t mean they cannot have a negative effect.
This means that pausing A/B tests won’t stop any (risky) changes from being made.
Stopping A/B tests won’t stop any changes from being made. It WILL stop you from LEARNING from the changes you make.
A/B testing is not riskier than just changing things based on gut feeling or whatever worked previously. But it has the added benefit that we actually (try to) measure the change and try to learn something.
“Customer buying behaviour during peaks is very different compared to off-peak behaviour. So why bother?”
Example: when you sell gifts, it’s important that customers trust that you’ll deliver on time. Especially for holidays (when there is a clear deadline), trust in a timely delivery might be more important than price.
Maybe this applies to you, maybe it doesn’t. But shouldn’t you do your utmost to figure out how people are buying from you, ESPECIALLY during peak season? You might have multiple peaks a year (we do) and maybe over 50% of your yearly revenue comes from peaks. It seems to me it’s even more important and profitable to exactly know what makes your users click during that period.
It could make sense to test different hypothesis or do different tests during peaks because of the different situation. You have more traffic, higher conversions, and your customers might be driven by different needs. So it makes sense to adust your tactics to that different situation.
If I had to choose, I’d rather test and optimize the website for peaks than for non-peaks. It’s way more effective.
“In the long run, we might improve. But on average the A/B tests themselves have a bad effect so we risk losing money when testing during peaks.”
Your mileage may vary here, but if you have a decent process around your A/B testing and use proper user insights as input, I can’t believe that is the case.
With our current setup, we end up doing 1 positive test for 1 negative test. And besides those we have a bunch of tests that have no measurable effect. Looking at the revenue/profit it’s even better: positive tests gain more than we lose with negative tests.
(This month I calculated the gain of the testing process itself in terms of overall profit during the testing period: whatif we would just do A/B tests and not share learnings, not implement any of the positive tests and not look at any potential future gain. Conclusion: even by just doing A/B tests my team is paying for itself.)
So in terms of direct profit/loss, this shouldn’t be a big risk. If you test a proper hypothesis, you will still learn something for next time either way.
By the way: assuming most tests have a negative effect suggests you believe you have a (near) perfect website with hardly anything that can be improved. Trust me, you’re wrong on that one. (:
Get your testing in higher gear and learn even more from your customers’ behaviour during these short bursts of transactions. Like I said above: calculate your required sample size. The extra traffic and conversion during peak seasons will give you extra capacity for testing! If you’re out of ideas, this might be a good moment to re-test some of your previous A/B test to see if they still hold up.
You probably think you need to improve your measurements, increase your testing confidence or improve your successful test ratio.
I know I do. Every day.
And like me, you’re probably already working on many improvements. But it’s by doing A/B testing that you can improve the process and success rates, learn from the changes. This is how you become the customer knowledge centre of your company.
Thx Arnout Hellemans & Rudger de Groot for proofreading this article (:
Most of my content is published on LinkedIn, so make sure to follow me there!
Recently I've seen some (often absolute) statements going around, generally in the line of "open source commerce platforms are a terrible idea". Now of course different solutions always have different pros and cons.
A hierarchy of evidence (or levels of evidence) is a heuristic used to rank the relative strength of results obtained from scientific research. I've created a version of this chart/pyramid applied to CRO which you can see below. It contains the options we have as optimizers and tools and methods we often use to gather data.