The COVID-19 outbreak has led to an exponential increase of publications and preprints about the virus, its causes, consequences, and possible cures. COVID-19 research has been conducted under high time pressure and has been subject to financial and societal interests. Doing research under such pressure may influence the scrutiny with which researchers perform and write up their studies. Either researchers become more diligent, because of the high-stakes nature of the research, or the time pressure may lead to cutting corners and lower quality output.
In this study, we conducted a natural experiment to compare the prevalence of incorrectly reported statistics in a stratified random sample of COVID-19 preprints and a matched sample of non-COVID-19 preprints.
Our results show that the overall prevalence of incorrectly reported statistics is 9-10%, but frequentist as well as Bayesian hypothesis tests show no difference in the number of statistical inconsistencies between COVID-19 and non-COVID-19 preprints.
Taken together with previous research, our results suggest that the danger of hastily conducting and writing up research lies primarily in the risk of conducting methodologically inferior studies, and perhaps not in the statistical reporting quality.
We investigated whether statistical reporting inconsistencies could be avoided if journals implement the tool statcheck in the peer review process.
In a preregistered study covering over 7000 articles, we compared the inconsistency rates between two journals that implemented statcheck in their peer review process (Psychological Science and Journal of Experimental and Social Psychology) with two matched control journals (Journal of Experimental Psychology: General and Journal of Personality and Social Psychology, respectively), before and after statcheck was implemented.
Preregistered multilevel logistic regression analyses showed that the decrease in both inconsistencies and decision inconsistencies around p = .05 is considerably steeper in statcheck journals than in control journals, offering support for the notion that statcheck can be a useful tool for journals to avoid statistical reporting inconsistencies in published articles.
I am happy to announce that our paper “Reproducibility of individual effect sizes in meta-analyses in psychology” was published in PLoS One (first-authored by Esther Maassen). In this study, we assessed 500 primary effect sizes from 33 psychology meta-analyses. Reproducibility was problematic in 45% of the cases (see Figure below for different causes). We strongly recommend meta-analysts to share their data and code.
I am very happy to announced that my paper “Practical tools and strategies for researchers to increase replicability” was listed as a Top Download for the journal Developmental Medicine & Child Neurology.
The paper lists an overview of concrete actions researchers can undertake to improve the openness, replicability, and overall robustness of their work.
I hope that the high number of downloads indicate that many researchers were able to cherry-pick open practices that worked for their situation.
I wrote an invited review for Developmental Medicine & Child Neurology about “Practical tools and strategies for researchers to increase replicability”.
Problems with replicability have been widely discussed over the last years, especially in psychology. By now, a lot of promising solutions have been proposed, but my sense is that researchers are sometimes a bit overwhelmed by all the possibilities.
My goal in this review was to make a list of some of the current recommendations that can be easily implemented. Not every solutions is always feasible for every project, so my advice is: copy best practices from other fields, see what works on a case-by-case basis, and improve your research step by step.
In a new paper, we ran statcheck on a bunch of experimental philosophy papers. Inconsistency rates are lower than in psychology, and evidential value seems high. Good news for the philosophers! See the full paper here.
We analyzed 131 meta-analyses in intelligence research to investigate effect sizes, power, and patterns of bias. We find a typical effect of r = .26 and a median sample size of 60.
The median power seems low (see figure below), and we find evidence for small study effects, possibly indicating overestimated effects. We don’t find evidence for a US effect, decline or early-extremes effect, or citation bias.
Comments are very welcome and can be posted on the PubPeer page https://pubpeer.com/publications/9F209A983618EFF9EBED07FDC7A7AC.