The paper Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO
Appeared in KDD 2007 (August 2007).
The paper is copyrighted by ACM.
© ACM, 2007. This is the author's version of the work.
It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version is published in KDD 2007
A longer version of this paper that appears in the journal of Data Mining and Knowledge Discovery is available here.
Abstract
The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments (single-factor or factorial designs), A/B tests (and their generalizations), split tests, Control/Treatment tests, and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end-users can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person’s Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled experiments.
What others are saying

As far as I know, it’s the first “academic” paper on what’s often called A/B testing. I say “academic” in quotes because the paper is relatively lightweight and is geared towards an audience of industry practitioners...
Given the lack of good technical publications on doing controlled experiments on the Web, this paper is certainly a welcome start.

Most of the major web companies use controlled experiments to test new features and products on their website. First, if you want to learn how these work you should read a recent paper...

The good folks on the Microsoft Experimentation Platform team have published a paper which gives a great introduction to how and why one can go about using controlled experiments (i.e. A/B testing) to improve the usability of a website.
In addition to one of the more concise primers on key statistical concepts for testing, the paper offers a series of lists of key considerations across the testing process...
So, go read it -- it's also handy to have around to share for folks wanting a quick primer on the stats involved in split testing.
...this was a good read. Section 5, Lessons Learned, is alone very worthwhile.
Recommended reading even if you aren't running a web application since the concepts could be applied to a variety of systems (but the inherent flexibility of websites makes them primary candidates for such testing.) Using A/B testing, you may be surprised to see how seemingly minor changes to a site can lead to major improvements in rates of use, clickthroughs, purchases, etc
(In the responses): Here is another paper by Dr. Ronny Kohavi (former Amazon head of Data-mining who now works for Microsoft) that would be useful for both developers & designers...

I recently read a white paper ...I was fascinated about how controlled experimentation is being applied -- and the value that Tealeaf brings when running these tests

Quick link to this page: http://bit.ly/ExPHiPPO