A Recipe for Replications
By Mark Brandt
Replications are in the (social psychological) news… again. And rightfully so. After a long hiatus, the publication and discussion of replication studies in the pages of our journals has become more commonplace. Rather than serving as fodder for discussion around the water cooler (read: coffee maker) and in the hallways of the latest SPSP conference (read: coffee hour), replications are now appearing in peer-reviewed publications. And this is likely to continue. Journals both specific to social psychology and more general interest journals have shown that they are willing to publish replication studies. But, just like original studies, not all replications are high quality studies. And, just like original studies, no single replication will provide us with definitive answers. But how shall we evaluate replication projects? That is a question that we were thinking about recently.
We set out to provide an initial answer (ungated) with several of our colleagues from several sub-disciplines of psychology with different perspectives and experiences in the current replication debate. Our “replication recipe” provides initial guidelines for conducting high quality and convincing replication studies – but the debate is likely far from over. So far, we have developed four main ingredients for a replication study. We discuss these in this blog, and we call researchers to debate, discuss, and update these guidelines regularly. Just like we do for original studies.
#1. Carefully defining the effects and methods that the researcher intends to replicate.
What is the most basic ingredient for any recipe? Determining effects, methods, effect sizes, etc. that you plan to replicate is relatively simple, though necessary as it determines how you best include the remaining ingredients. Many times this is really straightforward; however, other times it might not be as clear. For example, sometimes the effect that is most important for your particular theoretical or practical interest is the simple effect in a 2 X 2 interaction. Other times it may be difficult to calculate the original effect size that you need to conduct a power analysis (see Ingredient #3).
But not all replications were created equal. It is important to remember that resources are limited. Not every effect is worth replicating because it may not have theoretical or practical importance. Before embarking on a replication study (or any study for that matter!) it is important to consider whether the results of the replication study could impact theory or practice (and thus not just replicate every study – some studies are not worth replicating!).
#2. Following exactly the methods of the original study.
Step 1 is done. Now, gather the necessary materials. Ideally these will be the original materials from the original study. Often these can be obtained by contacting the original authors and, as we improve our reporting of original studies, we can simply start downloading materials from authors’ websites or other repositories (e.g., Open Science Framework). Ideally it would also be possible to recruit from a similar subject population and conduct the study is a highly similar lab environment. However, some of these ideals are difficult or impossible to achieve and studies may need to be adjusted to the local conditions and context (e.g., study the local football team instead of the local voetbal team) Nonetheless the methods of the original study should be followed as closely as possible with deviations from the original study highlighted and justified.
It is unlikely that a single researcher will be able to conduct the 100% perfect replication study that proves conclusively that an effect is “real” (or not). As multiple replications of important results are conducted in different labs, the most accurate assessment of the effect will come from a meta-analysis of these multiple replication studies and parsing the differences between the studies to discover relevant moderators.
#3. Having high statistical power.
Discussions on blogs, in articles, and, of course, on Twitter have been raised regarding statistical power for both original and replication studies. Low powered replications will produce imprecise estimates and have very wide confidence intervals. It is impossible to conclude much from a lower powered replication study (or any study for that matter). Therefore we recommend that authors conduct high-powered replications, with a power of at least 80% power, but ideally higher. There are also other ways to think about power that can be helpful for planning a replication study. For example, rather than determining the sample size necessary to reject the null given a particular effect size, methodologists have developed methods for determining the sample size necessary for having a sufficiently precise effect. (see here and here for interesting discussions about power and replication studies). We realize that this ideal may be tough to obtain, but there are creative solutions. For example, recruit the help of another lab to collect additional data, use the Open Science Framework to crowdsource replications across many labs (see e.g., the ManyLabsinitiative, or the CREP), or formally propose and carry out a registered replication report.
#4. Making complete details about the replication available.
As a community, we may have had visceral responses to others replicating our work. We shouldn’t. Replications can be controversial because they call an important effect into question and “failed” replications can have consequences on psychological theories (as they should) and people’s careers (as they shouldn’t, in our opinion). Because replications are consequential and because many people likely have a stake in the outcome, we think it is best practice to be open about one’s methods, materials, analyses, results, and data. This will allow everyone who has a stake in the replication study to look at the data, conduct confirmatory and exploratory analyses, and come to their own conclusions about the replication study.
We think that following these four ingredients will help make for high quality replication studies. They help researchers clearly situate their replication study in respect to the original study and the relevant theories/models, and they help researchers focus on details that will make their replication study as convincing as possible. To help people follow the replication recipe we developed a list of 36 questions that researchers should answer in the course of planning, conducting, analyzing, and writing-up a replication study. These questions are available in Table 1 of our paper(ungated), but also as a type of pre-registration on the Open Science Framework.
A Few Additional Thoughts
Recently a lot of discussion surrounding replications has discussed the role of moderators. We also believe that some replication failures will be the result of unanticipated moderators (or perhaps measurement error). But, before we embark on a replication project, why not give each other the benefit of the doubt? As we have noted, it is extremely difficult to achieve the ideal of an exact replication in one single study. A priori, we suggest that researchers think of potential moderators that may play a role when comparing the original and replication study.
The important ones will be theoretically relevant and will alter the theory by helping establish meaningful boundary conditions. To the extent these boundary conditions can be identified and tested, they can help social psychologists focus on where an effect occurs and where it does not. Then again, keeping better track of completely atheoretical differences (e.g., the effect disappears because you used bold instead of underlined text) can also show whether an effect is important and real or something more fleeting.
It is this back and forth of progressively and iteratively refining our understanding of an effect by establishing where it occurs, when, and with what measures that will help us best understand our effects and the mechanisms underlying these effects. Treated in this way, replications are a theoretical endeavor that will improve both the precision of our effect sizes, but also the precision of our theories.
Mark Brandt is an assistant professor in the Social Psychology Department at Tilburg University. His research focuses on causes and consequences of moral and political belief systems. More information about his research can be found at https://sites.google.com/site/brandtmj/. He also does that Twitter thing @MBrandt05
Hans IJzerman is also an assistant professor at the Social Psychology Department at Tilburg University. His research focuses on the cognitive processes for close relationships – in all its different manifestations (from embodiment to culture). More information about his stuff can be found at http://h.ijzerman.googlepages.com.