New Preprint: The Effects of p-Hacking and Publication Bias

We know that p-hacking and publication bias are bad. But how bad? And in what ways could they affect our conclusions?

Proud to report that my PhD student Esther Maassen published a new preprint: “The Impact of Publication Bias and Single and Combined p-Hacking Practices on Effect Size and Heterogeneity Estimates in Meta-Analysis“.

It’s a mouthful, but this nicely reflects the level of nuance in her conclusions. Some key findings include:

  • Publication bias is very bad for effect size estimation
  • Not all p-hacking strategies are equally detrimental to effect size estimation. For instance, even though optional stopping may be terrible for your Type I Error rate, it does not add much bias to effect size estimation. Selective outcome reporting and optional dropping of specific types of participants, on the other hand, are really really bad.
  • Heterogeneity was also impacted by p-hacking, but sometimes in surprising ways. Turns out: heterogeneity is a complex concept!

Her work includes a custom Shiny app, where users can see the impact of publication bias and p-hacking in their own scenarios: https://emaassen.shinyapps.io/phacking/.

Take-away: we need systemic change to promote open and robust scientific practices that avoid publication bias and p-hacking.

Blog: Should meta-scientists hold themselves to higher standards?

If your job effectively consists of telling other researchers how to do their job; what happens to your credibility if you drop the ball in your own research? What if you don’t always practice what you preach, or if you make mistakes?

For the monthly Meta-Research Center blog, I wrote about these dilemmas. Should meta-researchers be held to higher standards to be taken seriously? In the end, I concluded that good science isn’t about being perfect, it’s about being transparent, adaptable, and striving to do better.

Read the full blog here: Should meta-scientists hold themselves to higher standards? — Meta-Research Center

New Preprint on Reporting Errors in COVID-19 Research (a Registered Report)

The COVID-19 outbreak has led to an exponential increase of publications and preprints about the virus, its causes, consequences, and possible cures. COVID-19 research has been conducted under high time pressure and has been subject to financial and societal interests. Doing research under such pressure may influence the scrutiny with which researchers perform and write up their studies. Either researchers become more diligent, because of the high-stakes nature of the research, or the time pressure may lead to cutting corners and lower quality output.

In this study, we conducted a natural experiment to compare the prevalence of incorrectly reported statistics in a stratified random sample of COVID-19 preprints and a matched sample of non-COVID-19 preprints.

Our results show that the overall prevalence of incorrectly reported statistics is 9-10%, but frequentist as well as Bayesian hypothesis tests show no difference in the number of statistical inconsistencies between COVID-19 and non-COVID-19 preprints.

Taken together with previous research, our results suggest that the danger of hastily conducting and writing up research lies primarily in the risk of conducting methodologically inferior studies, and perhaps not in the statistical reporting quality.

You can find the full preprint here: https://psyarxiv.com/asbfd/.

Seed Funding for COVID-19 Project

I am happy to announce that Robbie van Aert, Jelte Wicherts, and I received seed funding from the Herbert Simon Research Institute for our project to screen COVID-19 preprints for statistical inconsistencies.

Inconsistencies can distort conclusions, but even if inconsistencies are small, they negatively affect the reproducibility of a paper (i.e., where did a number come from?). Statistical reproducibility is a basic requirement for any scientific paper.

We plan to check a random sample of COVID-19 preprints from medRxiv and bioRxiv for several types of statistical inconsistencies. E.g., does a percentage match the accompanying fraction? Do the TP/TN/FP/FN rates match the reported sensitivity of a test?

We have 3 main objectives:

  1. Post short reports with detected statistical inconsistencies underneath the preprint
  2. Assess the prevalence of statistical inconsistencies in COVID-19 preprints
  3. Compare the inconsistency-rate in COVID-19 preprints with the inconsistency-rate in similar preprints on other topics

We hypothesize that high time pressure may have led to a higher prevalence of statistical inconsistencies in COVID-19 preprints as opposed to preprints on less time sensitive issues.

We thank our colleagues at the Meta-Research Center for their feedback and help in developing the coding protocol.

See the full proposal here.

New Paper: Reproducibility of Individual Effect Sizes in Psychological Meta-Analyses

I am happy to announce that our paper “Reproducibility of individual effect sizes in meta-analyses in psychology” was published in PLoS One (first-authored by Esther Maassen). In this study, we assessed 500 primary effect sizes from 33 psychology meta-analyses. Reproducibility was problematic in 45% of the cases (see Figure below for different causes). We strongly recommend meta-analysts to share their data and code.

graph

Top Downloaded Paper

I am very happy to announced that my paper “Practical tools and strategies for researchers to increase replicability” was listed as a Top Download for the journal Developmental Medicine & Child Neurology.

The paper lists an overview of concrete actions researchers can undertake to improve the openness, replicability, and overall robustness of their work.

I hope that the high number of downloads indicate that many researchers were able to cherry-pick open practices that worked for their situation.

Read the full paper (open access) here.

wiley_certificate

METAxDATA Meeting at QUEST, Berlin

Last month, the QUEST center in Berlin organized the first METAxDATA meeting on building automated screening tools for data-driven meta-research. On the first night of the meeting, 13 researchers gave lightning talks about their tools. The clip below features my <2 minute lightning talk about statcheck.

All lightning talks were recorded and can be found here.

Open Software for Open Science

At the Solid Science Workshop in Bordeaux (September 6-7, 2018), I gave a workshop about free software to facilitate solid research practices. During this workshop, we collaboratively worked on a list of resources/software/tools that can be used to improve different stages of the research process.

Check out the list, share it with colleagues, or add your own resources to it here: https://bit.ly/opensciencesoftware.

The slides of the workshop can be found here: https://osf.io/s8wpz/.

empcycle