Microsoft's Experimentation Platform

Accelerating software innovation through trustworthy experimentation

Home
Contact Us
Cool Things about Microsoft
ExP Articles
ExP Talks & Presentations
ExP Tools
What's a HiPPO?
Online Experiments: Practical Lessons
By: Ron Kohavi, Roger Longbotham, and Toby Walker
 
Appears in IEEE Computer 2010
 
 
 
When running online experiments, getting numbers is easy; getting numbers you can trust is hard.

From ancient times through the 19th century, physicians used bloodletting to treat acne, cancer, diabetes, jaundice, plague, and hundreds of other diseases and ailments (D. Wooton, Doctors Doing Harm since Hippocrates, Oxford Univ. Press, 2006). It was judged most effective to bleed patients while they were sitting upright or standing erect, and blood was often removed until the patient fainted. On 12 December 1799, 67-year-old President George Washington rode his horse in heavy snowfall to inspect his plantation at Mount Vernon. A day later, he was in respiratory distress and his doctors extracted nearly half of his blood over 10 hours, causing anemia and hypotension; he died that night.

Today, we know that bloodletting is unhelpful because in 1828 a Parisian doctor named Pierre Louis did a controlled experiment. He treated 78 people suffering from pneumonia with early and frequent bloodletting or less aggressive measures and found that bloodletting did not help survival rates or recovery times.

Having roots in agriculture and medicine, controlled experiments have spread into the online world of websites and services. In an earlier Web Technologies article (R. Kohavi and R. Longobotham, “Online Experiments: Lessons Learned,” Computer, Sept. 2007, pp. 85-87) and a related survey (R. Kohavi et al., “Controlled Experiments on the Web: Survey and Practical Guide,” Data Mining and Knowledge Discovery, Feb. 2009, pp. 140-181), Microsoft’s Experimentation Platform team introduced basic practices of good online experimentation.

Three years later and having run hundreds of experiments on more than 20 websites, including some of the world’s largest, like msn.com and bing.com, we have learned some important practical lessons about the limitations of standard statistical formulas and about data traps. These lessons, even for seemingly simple univariate experiments, aren’t taught in Statistics 101. After reading this article we hope you will have better negative introspection: to know what you don’t know.