When the

Diederik Stapel scandal was first in the news last year, I thought it might be interesting to play around with his data and Benford's Law.

Benford's Law is a statistical

*artefact* that describes how the frequencies of the first digits in all sorts of large, natural datasets are not, as you'd perhaps expect, distributed evenly. There are not just as many numbers starting with '1' as there are numbers starting with '6' or '9'. In fact, the distribution follows a particular pattern with 1 being the most often observed first digit (around 30%) and 9 the least (around 5%). This finding is confirmed in all kinds of data sets and the phenomenon is occasionally used to check for fraud. The idea being that data that is made up or manipulated won't have the right distribution (

Here's an example in the Economist, earlier this week).

I had seen a couple of examples of the application of Benford's Law to spot scientific fraud and I thought it could be interesting to use the Stapel case to see how it would work in experimental social science. I didn't spend too much time on it. I took Stapel's CV and started tracking down his most recent publications (at that point none of which were retracted yet). Of his publications in 2011 I managed to get hold of the following 10 via the university library:

1. Stapel, D.A., & Lindenberg, S. (2011). Coping with chaos: How disordered contexts promote stereotyping and discrimination. Science, 332,251-253.

2. Lammers, J., Stoker, J.I., Jordan, J., Pollmann, M.M.H., & Stapel, D.A. (2011). Power increases infidelity among men and women. Psychological Science, 22, 1191-1197.

3. Stapel, D.A., & Van der Linde, L.A.J.G. (2011). What drives self-affirmation effects?: On the importance of differentiating value affirmation and attribute affirmation. Journal of Personality and Social Psychology, 101, 34-45.

4. Johnson, C.S., & Stapel, D.A. (2011). Happiness as alchemy: Positive mood and responses to social comparisons. Motivation and Emotion, 35, 165-180.

5. Stapel, D.A., & Noordewier, M.K. (2011). The mental roots of system justification: System threat, need for structure, and stereotyping. Social Cognition, 29, 238-254.

6. Van Doorn, J., & Stapel D.A. (2011). When and How Beauty Sells: Priming, Conditioning, and Persuasion Processes, Journal of Consumer Research, published online June 1, 2011.

7. Lammers, J., & Stapel, D.A. (2011) Racist biases in legal decisions are reduced by a justice focus. European Journal of Social Psychology, 41, 375-387.

8. Lindenberg, S.M., Joly, J.F., & Stapel, D.A. (2011). The norm-activating power of celebrity: The dynamics of success and influence. Social Psychology Quarterly, 74, 98-120.

9. Johnson, C.S., & Stapel, D.A. (2011). Reflection vs. Self-reflection: Sources of self-esteem boost determine behavioral outcomes. Social Psychology, 42, 144-151. 72

10. Lammers, J., & Stapel, D.A. (2011). Power increases dehumanization. Group Processes & Intergroup Relations, 14, 113-126.

My way of data collection was pretty crude. I took the results section(s) of these papers (and only the results sections) and simply scored every number I encountered. The only distinction I made is that I didn't count P-values (because they were often reported inexactly, p < 0.05 etc) and number of participants (can't really remember why I chose not to include these...). Things I did count included F statistics, t values, means, SD's, path coefficients, correlation coefficients, Cronbach's alpha's etc.. I tried to avoid double counting certain numbers - numbers that were presented in a table but also referred to in the text - but I worked pretty quickly and didn't actually read the texts so I probably overlooked many instances. I ended up with a data set of 1107 numbers, had Excel *extract* their first digits and made a bar graph with their frequencies. It certainly didn't look like the distribution as predicted by Benford's Law but I didn't quite know what to make of it. One thing I had noticed collecting the data for instance was that Stapel used a lot of 7-point Likert scales in his 'research'. It doesn't seem unlikely that that will influence the kind of first digits used. I wasn't sure if a data set based on similar but non-fraudulent papers would actually follow Benford's Law. I thought about collection data on other, presumably non-fraudulent research but regular work got in the way and the project ended up in a drawer.

It stayed there until about a month ago when the committees installed by the three universities where Stapel worked during his career - Amsterdam, Groningen and Tilburg - presented the results of their investigation into his work. Not all of Stapel's articles were based on fabricated data. The rapport includes handy lists of all of Stapel's publications with an indication of whether the committee had 'established fraud' or not. Out of the 10 articles published in 2011 that I had collected 7 were deemed to be fraudulent (1,3,4,5,6,7,8) and the remaining 3 were apparently not (2,9 and 10). So now I could make the comparison between Stapel's real and fake data and check the distribution of the first digits in both data sets (with 186 and 921 numbers respectively).

That worked pretty well, I thought. I'm especially surprised with how well the distribution of the first digits from the non-fraudulent papers seems to follow Benford's Law (as said, only 186 digits, from 3 papers). There appear to be more 5's than predicted, but maybe that's a consequence of using all those 7-point Likert scales. The first digits from the fraudulent papers clearly don't follow Benford's Law, seemingly confirming the committees' conclusion that they were made up. This is, of course, just a very simple analysis. There are more than a few possible catches. Perhaps the data of a particular kind of research were easier to fudge than other kind of research and all the graph shows is the difference between research with, say, lots of Likert scales and other research. And I doubt there is a big future for Benford's Law in spotting scientific fraud. As soon as potential fraudsters know their results will be tested this way they can simply make up their numbers so that they fit. But, for now, I thought this was a pretty neat example of the application of Benford's Law in the experimental social sciences.