The paper is in DRAFT form
Abstract
Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. While the theoretical aspects of offline controlled experiments have been well studied and documented, the practical aspects of running them in online settings, such as web sites and services, are still being developed. Online experiments are most useful in conjunction with agile development methodologies, where the necessary ingredients exist for rapid feedback and improvement cycles. As the usage of controlled experiments grows in these online settings, it is becoming more important to understand the opportunities and pitfalls one might face when using them in practice. Multiple lessons learned from running controlled experiments online were documented in the Practical Guide to Controlled Experiments on the Web (Kohavi, et al., 2007). In this follow-on paper, we focus on “advanced” pitfalls we have seen, which include a wide range of topics from incorrectly computing confidence intervals when reporting percent effects (as opposed to absolute effects) to surprising observations about the impact of the choice of metrics on statistical power, to the influence of robots and ways to remove them (a problem unique to online settings). Online experiments allow for techniques like gradual ramp-up of treatments to avoid the possibility of exposing many customers to a bad (e.g., buggy) Treatment. With that ability, we discovered that it’s easy to incorrectly identify the winning Treatment because of Simpson’s paradox. We also share some results from an actual experiment on the MSN portal, where the value of additional ads was evaluated using controlled experiments, and how a monetary Overall Evaluation Criterion (OEC) was developed.