Equivalence Testing for Psychological Research: A Tutorial

Published on Jun 1, 2018
· DOI :10.1177/2515245918770963
Daniel Lakens33
Estimated H-index: 33
(TU/e: Eindhoven University of Technology),
Anne M. Scheel6
Estimated H-index: 6
(TU/e: Eindhoven University of Technology),
Peder M. Isager9
Estimated H-index: 9
(TU/e: Eindhoven University of Technology)
Sources
Abstract
Psychologists must be able to test both for the presence of an effect and for the absence of an effect. In addition to testing against zero, researchers can use the two one-sided tests (TOST) procedure to test for equivalence and reject the presence of a smallest effect size of interest (SESOI). The TOST procedure can be used to determine if an observed effect is surprisingly small, given that a true effect at least as extreme as the SESOI exists. We explain a range of approaches to determine the SESOI in psychological science and provide detailed examples of how equivalence tests should be performed and reported. Equivalence tests are an important extension of the statistical tools psychologists currently use and enable researchers to falsify predictions about the presence, and declare the absence, of meaningful effects.
📖 Papers frequently viewed together
201547.73Science
References33
Newest
#1Daniel Lakens (TU/e: Eindhoven University of Technology)H-Index: 33
#2Federico Adolfi (MPG: Max Planck Society)H-Index: 10
Last. Rolf A. Zwaan (EUR: Erasmus University Rotterdam)H-Index: 71
view all 88 authors...
In response to recommendations to redefine statistical significance to P ≀ 0.005, we propose that researchers should transparently report and justify all choices they make when designing a study, including the alpha level.
Source
#1Sandy Schumann (University of Oxford)H-Index: 6
#2Olivier Klein (ULB: Université libre de Bruxelles)H-Index: 41
Last. Miles Hewstone (University of Oxford)H-Index: 98
view all 4 authors...
Abstract Computer-mediated intergroup contact (CMIC) is a valuable strategy to reduce negative sentiments towards members of different social groups. We examined whether characteristics of communication media that facilitate intergroup encounters shape its effect on out-group attitudes. Specifically, we propose that concealing individuating cues about out-group members during CMIC increases prejudice, as interaction partners are perceived as less socially present. To assess these hypotheses, we ...
Source
#1Tobias L. Kordsmeyer (GAU: University of Göttingen)H-Index: 8
#2Lars Penke (GAU: University of Göttingen)H-Index: 48
Abstract Developmental instability (DI) has been proposed to relate negatively to aspects of evolutionary fitness, like mating success. One suggested indicator is fluctuating asymmetry (FA), random deviations from perfect symmetry in bilateral bodily traits. A meta-analytically robust negative association between FA and number of lifetime sexual partners has been previously shown in men and women. We examined the relationship between bodily FA across twelve traits and indicators of quantitative ...
Source
#1Daniel Lakens (TU/e: Eindhoven University of Technology)H-Index: 33
Scientists should be able to provide support for the absence of a meaningful effect. Currently, researchers often incorrectly conclude an effect is absent based a nonsignificant result. A widely recommended approach within a frequentist framework is to test for equivalence. In equivalence tests, such as the two one-sided tests (TOST) procedure discussed in this article, an upper and lower equivalence bound is specified based on the smallest effect size of interest. The TOST procedure can be used...
Source
#1Mitch Brown (USM: University of Southern Mississippi)H-Index: 13
#2Dario N. Rodriguez (UD: University of Dayton)H-Index: 5
Last. Melissa A. Berry (UD: University of Dayton)H-Index: 5
view all 4 authors...
The behavioral immune system (BIS) is comprised of a variety of psychological and behavioral defenses designed to protect against pathogenic threats. These processes predict various affective and behavioral responses in myriad human contexts, including putative decisions to mitigate exposure to environmental pathogens. We investigated whether the strength of BIS responses predicted jurors’ verdicts in a sexual assault trial, wherein strength of the evidence against the defendant was manipulated ...
Source
#1Marie Delacre (ULB: Université libre de Bruxelles)H-Index: 6
#2Daniel Lakens (TU/e: Eindhoven University of Technology)H-Index: 33
Last. Christophe Leys (ULB: Université libre de Bruxelles)H-Index: 56
view all 3 authors...
When comparing two independent groups, psychology researchers commonly use Student’s t -tests. Assumptions of normality and homogeneity of variance underlie this test. More often than not, when these conditions are not met, Student’s t -test can be severely biased and lead to invalid statistical inferences. Moreover, we argue that the assumption of equal variances will seldom hold in psychological research, and choosing between Student’s t -test and Welch’s t -test based on the outcomes of a tes...
Source
#1Richard D. MoreyH-Index: 45
#2Daniel LakensH-Index: 33
Source
#1Katherine S. Button (UoB: University of Bristol)H-Index: 19
#2Daphne-Zacharenia Kounali (UoB: University of Bristol)H-Index: 16
Last. Glyn Lewis (UCL: University College London)H-Index: 132
view all 8 authors...
Background The Beck Depression Inventory, 2nd edition (BDI-II) is widely used in research on depression. However, the minimal clinically important difference (MCID) is unknown. MCID can be estimated in several ways. Here we take a patient-centred approach, anchoring the change on the BDI-II to the patient's global report of improvement. Method We used data collected ( n = 1039) from three randomized controlled trials for the management of depression. Improvement on a ‘global rating of change’ qu...
Source
#1Robert P. Burriss (Northumbria University)H-Index: 26
#2Jolyon Troscianko (University of Exeter)H-Index: 22
Last. Hannah M. Rowland (ZSL: Zoological Society of London)H-Index: 19
view all 9 authors...
Human ovulation is not advertised, as it is in several primate species, by conspicuous sexual swellings. However, there is increasing evidence that the attractiveness of women’s body odor, voice, and facial appearance peak during the fertile phase of their ovulatory cycle. Cycle effects on facial attractiveness may be underpinned by changes in facial skin color, but it is not clear if skin color varies cyclically in humans or if any changes are detectable. To test these questions we photographed...
Source
#1Uri Simonsohn (UPenn: University of Pennsylvania)H-Index: 32
This article introduces a new approach for evaluating replication results. It combines effect-size estimation with hypothesis testing, assessing the extent to which the replication results are consistent with an effect size big enough to have been detectable in the original study. The approach is demonstrated by examining replications of three well-known findings. Its benefits include the following: (a) differentiating “unsuccessful” replication attempts (i.e., studies yielding p > .05) that are...
Source
Cited By343
Newest
BACKGROUND AND OBJECTIVES Vicarious threat conditioning abnormalities are theorized to confer vulnerability to a wide range of emotional problems. We tested two different conceptual models of this non-specificity. First, hypersensitivity to socially conditioned danger cues might predict standing on a general internalizing dimension that represents commonalities among various forms of anxiety and depression. Second, this hypersensitivity might predict specific symptom clusters, such as panic or s...
Source
#1Zhiying Yue (UB: University at Buffalo)H-Index: 4
#2Renwen Zhang (NUS: National University of Singapore)
Social media browsing is commonly seen as a trigger of unhealthy social comparison (i.e., upward contrast), which negatively affects well-being. One underlying assumption is the predominance of positive self-presentation on social media, which may have shifted during the COVID-19 pandemic when negative disclosures have become more prevalent. In this study, we conceptualize social comparison as a multi-dimensional construct based on different comparing targets and processes, and explore how indiv...
Source
#1Cameron J. Bunker (ASU: Arizona State University)H-Index: 3
#2Michael E. W. Varnum (ASU: Arizona State University)H-Index: 22
Abstract null null In series of studies, we sought to assess the extent to which social media use was related to the false consensus effect. Study 1 (N = 493) and Study 2 (N = 364, preregistered) assessed the relationship between social media use and the false consensus effect for three psychological characteristics: political attitudes, personality traits, and fundamental social motives. Study 3 (N = 875) explored lay beliefs about the strength of the relationships between social media use and ...
Source
#1Barbora Dolezalova (Masaryk University)
#2Natalie Hubackova (Masaryk University)
Last. Jakub ProchĂĄzka (Masaryk University)H-Index: 6
view all 7 authors...
This replication of the study of Genschow et al. (2012) examines the effect of the color red on beverage consumption. In total, 148 men were asked to consume drinks in either red- or blue-labeled cups. Cup labels were assigned at random. Unlike in the previous study, the findings in our replication study did not provide empirical support for the hypothesis that people will drink less from red-labeled cups than blue-labeled cups. The difference between groups in drink consumption was non-signific...
Source
#1Paula von Spreckelsen (UG: University of Groningen)H-Index: 1
#2Ineke Wessel (UG: University of Groningen)H-Index: 28
Last. Peter J. de Jong (UG: University of Groningen)H-Index: 68
view all 4 authors...
The term Repulsive Body Image (RBI) refers to a schematic construct combining body-directed self-disgust and other negative body image features, that is assumed to bias information processing, including autobiographical memory retrieval. When specific memories about the own body are retrieved, intense self-disgust may arise and trigger urges to escape from those memories. We asked 133 women with high (HRBI; n = 63) and low (LRBI; n = 70) levels of habitual body-directed self-disgust to recall au...
Source
#1Rua M. Williams (Purdue University)H-Index: 6
#2Kiana Alikhademi (UF: University of Florida)H-Index: 2
Last. Juan E. Gilbert (UF: University of Florida)H-Index: 17
view all 3 authors...
Abstract null null Virtual Reality (VR) and other game-like experiences are popular intervention platforms in neurocognitive rehabilitation research. Executive Functions (EF), the cognitive processes that regulate attention and goal-oriented action, are recognized as a domain of concern in several congenital and acquired neurocognitive conditions (e.g.: ADHD, autism, addiction, cognitive decline, traumatic brain injury, and stroke). VR-based simulations of real-world tasks have shown potential f...
Source
#1Didac Vidal-Piñeiro (University of Oslo)H-Index: 17
#2Yunpeng Wang (University of Oslo)H-Index: 35
Last. Sandra DĂŒzel (MPG: Max Planck Society)H-Index: 14
view all 31 authors...
Scientists who study the brain and aging are keen to find an effective way to measure brain health, which could help identify people at risk for dementia or memory problems. One popular marker is ‘brain age’. This measurement uses a brain scan to estimate a person’s chronological age, then compares the estimated brain age to the person’s actual age to determine whether their brain is aging faster or slower than expected for their age. However, since brain age relies on one brain scan taken at on...
Source
#1Paulo Ricardo Prado Nunes (UEMG: Universidade do Estado de Minas Gerais)H-Index: 8
Last. FĂĄbio Lera OrsattiH-Index: 19
view all 6 authors...
INTRODUCTION Low-volume functional high-intensity interval training (F-HIIT) improves cardiorespiratory fitness, body composition, and physical function similarly to combined training (CT, gold standard protocol), however no previous studies have compared the F-HIIT equivalence with CT in reducing blood pressure in older people, particularly in postmenopausal women (PW). Therefore, the aim of this study (trial registration: NCT03200639) was designed to test whether F-HIIT of low volume is an equ...
Source
#1N. MajernikH-Index: 5
#2Gerard AndonianH-Index: 15
Last. J. B. RosenzweigH-Index: 33
view all 9 authors...
Advanced acceleration methods based on wakefields generated by high energy electron bunches passing through dielectric-based structures have demonstrated >V/m fields, paving the first steps on a path to applications such as future compact linear colliders. For a collider scenario, it is desirable that, in contrast to plasmas, wakefields in dielectrics do not behave differently for positron and electron bunches. In this Letter, we present measurements of large amplitude fields excited by posit...
#1Emiel Cracco (UGent: Ghent University)H-Index: 13
#2Haeeun Lee (Goldsmiths, University of London)
Last. Guido Orgs (Goldsmiths, University of London)H-Index: 15
view all 7 authors...
The human brain has dedicated mechanisms for processing other people's movements. Previous research has revealed how these mechanisms contribute to perceiving the movements of individuals but has left open how we perceive groups of people moving together. Across three experiments, we test whether movement perception depends on the spatiotemporal relationships among the movements of multiple agents. In Experiment 1, we combine EEG frequency tagging with apparent human motion and show that posture...
Source
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.