Seven Pitfalls to Avoid when Running Controlled Experiments on the Web

The paper appeared in KDD 2009.  PDF

Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. While the theoretical aspects of offline controlled experiments have been well studied and documented, the practical aspects of running them in online settings, such as web sites and services, are still being developed. As the usage of controlled experiments grows in these online settings, it is becoming more important to understand the opportunities and pitfalls one might face when using them in practice. A survey of online controlled experiments and lessons learned were previously documented in Controlled Experiments on the Web: Survey and Practical Guide(Kohavi, et al., 2009). In this follow-on paper, we focus on pitfalls we have seen after running numerous experiments at Microsoft.The pitfalls include a wide range of topics, such as assuming that common statistical formulas used to calculate standard deviation and statistical power can be applied and ignoring robots in analysis (a problem unique to online settings). Online experiments allow for techniques like gradual ramp-up of treatments to avoid the possibility of exposing many customers to a bad (e.g., buggy) Treatment. With that ability, we discovered that it’s easy to incorrectly identify the winning Treatment because of Simpson’s paradox.

What others are saying (from an older version)
  • MS Experimentation team makes another winner

The MS team creating their own online experimentation platform continue to impress me with their willingness to share both their learning and their expertise.

It’s one thing to pump out “best practices” which are watered down samples of simple a/b findings. These guys not only talk about the complexities of multivariate testing, but give concrete examples and the statistics behind them. Offermatica, Optimost, even Memetrics had good stats behind their systems, but they rarely talked about the meaty stuff. The MS team has really been a great resource to help those past the simple A/B.

You don’t need to be a super stats-head to get the basic issues here, but if you don’t even take the time to understand the basics, you will waste lots of time and money. Well worth the read.

  • Getting Serious About Testing: Learn from the Pros

The Exp Platform, led by Ronny Kohavi, at MSFT publishes from this position of strength. Their latest, 7 pitfalls to controlled experiments on the web, is a solid read for those aspiring to live in this space.
I've blogged the guide to practical web experiments and it's also highly recommended. It provides an overview of the key issues to deal with in setting things up including sampling, failure versus success evaluation, and common pitfalls like day of the week effects.

  • Required Reading on Conversion: Seven Testing Pitfalls (seofaststart.com)
Seven Pitfalls to Avoid when Running Controlled Experiments on the Web is a great white paper by Thomas Crook, Brian Frasca, Ronny Kohavi, Roger Longbotham from Microsoft. Check out the site for the MSFT Experimentation Platform while you're at it. Cool stuff.
Quick link to this page: http://bit.ly/expPitfalls