Pitfalls of Long-Term Online Controlled Experiments

Appears 2016 IEEE International Conference on Big Data.  PDFSlides.


Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software.  Product ideas are evaluated as scientific hypotheses, and tested on web sites, mobile applications, desktop applications, services, and operating system features.

One of the key challenges for organizations that run controlled experiments is to select an Overall Evaluation Criterion (OEC), i.e., the criterion by which to evaluate the different variants. The difficulty is that short-term changes to metrics may not predict the long-term impact of a change. For example, raising prices likely increases short-term revenue but also likely reduces long-term revenue (customer lifetime value) as users abandon.  Degrading search results in a Search Engine causes users to search more, thus increasing query share short-term, but increasing abandonment and thus reducing long-term customer lifetime value. Ideally, an OEC is based on metrics in a short-term experiment that are good predictors of long-term value.

To assess long-term impact, one approach is to run long-term controlled experiments and assume that long-term effects are represented by observed metrics. In this paper we share several examples of long-term experiments and the pitfalls associated with running them. We discuss cookie stability, survivorship bias, selection bias, and perceived trends, and share methodologies that can be used to partially address some of these issues.

While there is clearly value in evaluating long-term trends, experimenters running long-term experiments must be cautious, as results may be due to the above pitfalls more than the true delta between the Treatment and Control.  We hope our real examples and analyses will sensitize readers to the issues and encourage the development of new methodologies for this important problem.



Pavel Dmitriev, Brian Frasca, Somit Gupta, Ron Kohavi and Garnet Vaz, "Pitfalls of long-term online controlled experiments," 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, 2016, pp. 1367-1376.
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7840744&isnumber=7840573

author={Pavel Dmitriev and Brian Frasca and Somit Gupta and Ron Kohavi and Garnet Vaz},
booktitle={2016 IEEE International Conference on Big Data (Big Data)},
title={Pitfalls of long-term online controlled experiments},
Quick link to this page: http://bit.ly/expLongTerm