Talking Text & Data Mining at the European Commission

“The right to read is the right to mine”. That was the motto of yesterday’s meeting at the European Commission, where we discussed how new European copyright laws would affect text and data mining (TDM) research.

The new proposal would seriously impede the use of TDM for businesses; effectively they would not have the right to mine content they already have legal access to, which is of course very strange.

The proposal does include an exemption for non-commercial research organizations – which includes universities, and with that, my work – but this is still not sufficient. For one, it would prevent scientists to commercialize any breakthroughs based on TDM research. On top of that, an increasing number of scientists seaks collaboration with businesses (for example, to increase the chances of getting a Horizon 2020 grant).

For updates on this legislation, and more info on the TDM restrictions, see the website, including an open letter, of the European Alliance for Research Excellence (EARE).


Nature Comment: Share Analysis Plans and Results

Nature published a series of comments all focused on ways to fix our statistics. In my comment, I argue that the main problem is the flexibility in data analysis, combined with incentives to find significant results. A possible solution would be to preregister analysis plans, and to share data.

Read the entire piece here, for the following set of solutions:

  • Jeff Leek: Adjust for human cognition
  • Blake McShane, Andrew Gelman, David Gal, Christian Robert, and Jennifer Tackett: Abandon statistical significance
  • David Colquhoun: State false-positive risk, too
  • Michèle Nuijten: Share analysis plans and results
  • Steven Goodman: Change norms from within



Illustration by David Parkins


New Preprint: statcheck’s Validity is High

In our new preprint we investigated the validity of statcheck. Our main conclusions were:

  • statcheck’s sensitivity, specificity, and overall accuracy are very high. The specific numbers depended on several choices & assumptions, but ranged from:
    • sensitivity: 85.3% – 100%
    • specificity: 96.0% – 100%
    • accuracy: 96.2% – 99.9%
  • The prevalence of statistical corrections (e.g., Bonferroni, or Greenhouse-Geisser) seems to be higher than we initially estimated
  • But: the presence of these corrections doesn’t explain the high prevalence of reporting inconsistencies in psychology

We conclude that statcheck’s validity is high enough to recommend it as a tool in peer review, self-checks, or meta-research.


statcheck Runner-Up for Sentinel Award

Screen-Shot-2017-08-02-at-4.05.46-PM-2Publons announced the winner of the Sentinel Award for outstanding advocacy, innovation or contribution to scholarly peer review, and I am proud to announce that statcheck was crowned runner-up!

I am honored that the judges considered statcheck a useful contribution to the peer review system. In the end, one of the things I hope to achieve is that all Psychology journals will consider it standard practice to quickly “statcheck” a paper for statistical inconsistencies to avoid publishing them.

A very warm congratulations to the winner of the award: Irene Hames. Irene spent most of her career on improving the quality of peer review and it is great that her work is recognized in this way! Also congratulations to the rest of the Sentinel Award nominees: Retraction WatchAmerican Geophysical UnionORCiDF1000ResearchThe Committee on Publication Ethics (COPE)Kyle Martin and Gareth Fraser.

For more information about the award, the winner, and the finalists, see this page.

The Guardian’s Science Weekly Podcast feat. statcheck

This week the Guardian’s Science Weekly podcast focuses on statistical malpractice and fraud in science. We talk about the role of statcheck in detecting statistical inconsistencies, and discuss the causes and implications of seemingly innocent rounding errors.

This podcast also offers fascinating insights from consultant anaesthetist John Carlisle about the detection of data fabrication, and president of the Royal Statistical Society David Spiegelhalter about the dangers of statistical malpractice.

statcheck Shortlisted for Publon Sentinel Award!

Proud to announce that I’ve been shortlisted for the Publon Sentinel Award for my work on statcheck. The Sentinel Award is an award for outstanding advocacy, innovation or contribution to scholarly peer review.

At this point, statcheck is used in the peer review process of two major psychology journals (Psychological Science and the Journal for Experimental Social Psychology) and an increasing number of journals are recommending using statcheck on your own manuscript before submitting it.

For more information about the award and the other great candidates, see this page.


New Preprint: Data Sharing & Statistical Inconsistencies

We just published the preprint of our new study “Journal Data Sharing Policies and Statistical Reporting Inconsistencies in Psychology” at

In this paper, we ran three independent studies to investigate if data sharing is related to fewer statistical inconsistencies in a paper. Overall, we found no relationship between data sharing and reporting inconsistencies. However, we did find that journal policies on data sharing are extremely effective in promoting data sharing (see the Figure below).


We argue that open data is essential in improving the quality of psychological science, and we discuss ways to detect and reduce reporting inconsistencies in the literature.