I am an Assistant Professor at the Department of Methodology and Statistics at the School of Social and Behavioral Sciences of Tilburg University, the Netherlands.
My research focuses on meta-science, including topics such as replication, publication bias, statistical errors, and questionable research practices. I am currently interested in the idea of looking at details (e.g., statistical reporting errors) to uncover bigger problems (e.g., the overall robustness of a conclusion). I received an NWO Veni grant to further expand this idea.
& other insights I brought home from the Open Science Retreat
“I wanted to create an event that I actually wanted to go to.” Those were the opening words of Heidi Seibold at the start of the Open Science Retreat.
It was Sunday afternoon, and around 50 people from a wide variety of backgrounds had just arrived in the dunes of Schoorl, The Netherlands. For five days, we would stay on the grounds of the Mennonite Church, surrounded by woods and dunes, to talk about Open Science.
But not just to talk, Heidi continued to explain. In her dream event, there would be a balance of three parts: conversations, actually getting work done, but also recharging. The goal was to leave with more energy than you came with.
I was very intrigued by this proposal, because most conferences I’ve gone to have been mainly about networking and promoting your own work. A handful of experiences were also about actually getting work done, but I can’t say I ever went home from a conference (even the fun ones!) recharged – more like the opposite.
Not your standard conference
It became clear pretty much immediately that this would not be your standard conference. Within an hour of arriving, we were already split up in teams for “Old Dutch Games” (including pin the tail on the donkey, an obstacle course, and tug-of-war). We were also notified that if you were assigned to Sleeping House 1 (me), your bathroom was in another building. Outside. Oh, and also, we would be taking turns in doing the dishes and sweeping the floors.
Interestingly, Mel Imming, the main organizer of this edition of the event, later confided to me that they could’ve easily afforded the package deal where the location would take care of the dishes. She deliberately went for the do-it-yourself option. And it was the smartest choice ever.
Imagine the standard interaction at your run-of-the-mill conference. You aimlessly wander through a hallway, end up at a random standing table with people you’ve never seen before. Every conversation starts with: “So… what is your research about?” After 20 identical conversations, nothing sticks anymore.
Now imagine you’re with a handful of people, figuring out how the industrial dishwasher works and drying forks together, while finding out how they ended up at this event and what their particular ideas are about Open Science (or what type of side jobs they’ve done as a teenager).
I learned that making connections the second way is much easier, more memorable, and more fun. And also that drying forks takes a very long time.
The three magic ingredients for high productivity
If you think that the whole event was kumbaya and nothing got done – think again. By the end of the three mornings scheduled for group work, six main ideas were transformed into viable and concrete sets of materials, guidelines, reports, and even a short film.
I think this high productivity at the Open Science Retreat was due to three main things that don’t often happen elsewhere: you were 1) locked1 in a room with 2) a small but highly diverse team, and 3) a very tight deadline until the plenary show-and-tell.
As a researcher, this is usually not what my work looks like. Collaboration typically takes the form of having periodic 1-hour meetings and sending a lot of commented drafts back and forth. Multiple such collaborations usually co-exist, which means that the action points in each meeting or draft end up at the bottom of an ever-growing and unfocused to-do list. In contrast, being in the same room with all relevant people and having a dedicated timeslot of three hours to work (rather than discuss-and-put-on-list) gave an incredible productivity and motivation boost.
The second magic ingredient for this high productivity was the diversity of the teams. Participants were diverse in age, career stage, field, job type (we had scientists, funders, open science coordinators, people who left academia to found a startup, information specialists, data stewards, and more), which in turn meant a wide diversity in perspectives and skills.
To give you an idea of what this diversity meant in practice: in my group, we thought about ways to hold journals accountable when Data Availability Statements (DAS) don’t match actual data availability (DAS gap2). Even though I proposed this topic, my ideas on how to achieve this were quite limited (see Fig. 1). However, a group of people with expertise in data storage and citation best practices, knowledge of recent funder reports on the topic, statistical know-how, programming skills, a network in the editing world, and on-point project management skills, however, managed to crunch the numbers on the DAS gap per journal, write an email template to contact and confront journals, compile recommendations for avoiding DAS gaps, make a guide for people wanting to flag a DAS gap, and a full report describing our activities.3
I can guarantee you that even if I had an army of clones, we’d still be staring angrily at a DAS gap with nothing to show for it.
Figure 1. My initial ideas for our group work on holding journals accountable for void data sharing statements. Credits to our group’s meme-master — you know who you are.
Finally, having just nine hours to start, execute, and wrap-up our project helped tremendously with focus. It forced us to build in repeated moments of reflection where we collectively reconsidered whether the road we were on was still viable within the timeframe or whether we needed to change direction. In my own work, the lack of any real deadline except my retirement4 easily distracts me into pursuing all kinds of irrelevant side quests and losing sight of the main goals. Apparently, all I need is nine hours and no escape route.
The North Sea, a Disney duet, and the future of Open Science
When we arrived, large printed schedules were waiting for us next to the registration desk. And on them was… nothing. It was up to us to fill in the time slots with activities we deemed useful, interesting, fun, or all of the above. Mornings were for group work, but you could also propose a new project, have a coffee with someone, take that work call you couldn’t get out of, read a book, go for a run, or – if you really couldn’t resist – present something.
This flexibility in the schedule meant that you could match your energy to the type of activity. Were you in the mood to brainstorm about a complete overhaul of the peer review system (always), or for a mentoring conversation with someone who chose a different career path? Did you want to take a break alone, play the meme-your-open-science card game, or curate your own playlist, strap on a unicorn headband, and lose yourself at the silent disco?5
Beside the fact that I had SO much fun, there is also a more serious point to all these activities. It’s easier to connect and work together if you’ve already seen another side of a person. It’s easier to reach out afterwards if you’ve had an open-hearted conversation in the dunes about what it meant for work-life balance to become a parent, if you’ve collectively dipped in a VERY cold North Sea, or sung a heartfelt rendition of a Disney duet during karaoke.
I strongly believe that this type of community building is key in inspiring any type of change in a rigid system. Most people invested in Open Science are in it for the greater good. They work hard – frequently in their own time – to create a system that benefits all, but ironically not always the individual. To prevent such people from burning out, becoming disillusioned, or being weeded out by a system that favors p-hacked Nature publications, we need to create a community where these people feel they belong, where they can work together, and where they can recharge.
I’m sure I don’t only speak for myself when I say that I went home inspired, recharged, and a little bit sad that it was over.
And I’ve heard there is no such thing if you’re a professor… ↩︎
It turned out that I was in the mood to do EVERYTHING, which meant that I went from a three hour jam session, to karaoke, to silent disco and then got up the next morning at 6:45AM to be in time for the daily morning yoga sessions. ↩︎
It’s a mouthful, but this nicely reflects the level of nuance in her conclusions. Some key findings include:
Publication bias is very bad for effect size estimation
Not all p-hacking strategies are equally detrimental to effect size estimation. For instance, even though optional stopping may be terrible for your Type I Error rate, it does not add much bias to effect size estimation. Selective outcome reporting and optional dropping of specific types of participants, on the other hand, are really really bad.
Heterogeneity was also impacted by p-hacking, but sometimes in surprising ways. Turns out: heterogeneity is a complex concept!
Her work includes a custom Shiny app, where users can see the impact of publication bias and p-hacking in their own scenarios: https://emaassen.shinyapps.io/phacking/.
Take-away: we need systemic change to promote open and robust scientific practices that avoid publication bias and p-hacking.
If your job effectively consists of telling other researchers how to do their job; what happens to your credibility if you drop the ball in your own research? What if you don’t always practice what you preach, or if you make mistakes?
For the monthly Meta-Research Center blog, I wrote about these dilemmas. Should meta-researchers be held to higher standards to be taken seriously? In the end, I concluded that good science isn’t about being perfect, it’s about being transparent, adaptable, and striving to do better.
I’m very pleased to announce that Cas Goos and Dennis Peng are joining the Meta-Science team under my supervision.
Cas will work on improving statistical reproducibility in psychology: what are journals doing already? And is it working? This is a project together with Jelte Wicherts.
Dennis will look at the statistical validity of intervention studies in clinical psychology. What different analyses are used? Is there room for opportunistic use of degrees of freedom in these analyses? How can it be better? This is a project together with Paul Lodder and Jelte Wicherts.
I’m excited to see the progress on both projects. Welcome to the team!
The Meta-Research Center is looking for a new PhD candidate!
In this project, you will look at the statistical validity of psychological intervention studies. You will work under direct supervision of Dr. Paul Lodder, Prof. Jelte Wicherts, and myself.
The position is perfect for a student interested in clinical psychology, statistics, methodology, and meta-research.
Details about the position and application can be found here: https://tiu.nu/21539.
The COVID-19 outbreak has led to an exponential increase of publications and preprints about the virus, its causes, consequences, and possible cures. COVID-19 research has been conducted under high time pressure and has been subject to financial and societal interests. Doing research under such pressure may influence the scrutiny with which researchers perform and write up their studies. Either researchers become more diligent, because of the high-stakes nature of the research, or the time pressure may lead to cutting corners and lower quality output.
In this study, we conducted a natural experiment to compare the prevalence of incorrectly reported statistics in a stratified random sample of COVID-19 preprints and a matched sample of non-COVID-19 preprints.
Our results show that the overall prevalence of incorrectly reported statistics is 9-10%, but frequentist as well as Bayesian hypothesis tests show no difference in the number of statistical inconsistencies between COVID-19 and non-COVID-19 preprints.
Taken together with previous research, our results suggest that the danger of hastily conducting and writing up research lies primarily in the risk of conducting methodologically inferior studies, and perhaps not in the statistical reporting quality.
We investigated whether statistical reporting inconsistencies could be avoided if journals implement the tool statcheck in the peer review process.
In a preregistered study covering over 7000 articles, we compared the inconsistency rates between two journals that implemented statcheck in their peer review process (Psychological Science and Journal of Experimental and Social Psychology) with two matched control journals (Journal of Experimental Psychology: General and Journal of Personality and Social Psychology, respectively), before and after statcheck was implemented.
Preregistered multilevel logistic regression analyses showed that the decrease in both inconsistencies and decision inconsistencies around p = .05 is considerably steeper in statcheck journals than in control journals, offering support for the notion that statcheck can be a useful tool for journals to avoid statistical reporting inconsistencies in published articles.
In December 2020, Willem Sleegers and I were awarded the Young eScientist Award from the Netherlands eScience Center for our proposal to improve statcheck’s searching algorithm. Today marks the start of our collaboration with the eScience Center and we are very excited to get started!
In this project, we plan to extend statcheck’s search algorithm with natural language processing algorithms, in order to recognize more statistics than just the ones reported perfectly in APA style (a current restriction). We hope that this extension will expand statcheck’s functionality beyond psychology, so that statistical errors in, e.g., biomedical and economics papers can also be detected and corrected.
More information about the award can be found here.