Friday, March 01, 2019

What Do First-Year Economics Students Know About Inequality?

I dedicate one of the lectures in the first year Micro-economics course I teach to talking about inequality. It starts out in the context of the labour market, covering things like wage differentials, the education premium and discrimination in the labour market, but subsequently I also talk about more general aspects of income and wealth inequality like the Gini coefficient and Lorenz Curves.

For the last few years I have been using Norton and Ariely (2011) as an exercise to get the students to think about (wealth) inequality. It's a very simple experiment with fun results. I ask the students to guess the wealth distribution (for the UK, where we are) by getting them to say what share of the total wealth they think is owned by the poorest 20%, by the second poorest 20% etc. In a similar fashion I also ask what their ideal distribution would be.

Typically, the results are very similar to those in the original Norton and Ariely paper (and my experience in this respect is pretty much the same as for Barnes, Easton and Leupp Hanig (2018) who write about how they have also used this experiment in the classroom). Students seem to underestimate the existing inequality by a considerable margin. Below I've created Lorenz Curves (using this website) for the average distribution as guessed by the students (left) and the actual distribution (right, based on data from the Equality Trust). The ideal distribution from the students is even more equal.

I really like this exercises but there are a couple of problems with it. For one, my students seem to have some trouble working with quintiles in this context. I often got back distributions where the share for a lower quintile was higher than that for a higher quintile. Following Norton and Ariely I usually reordered the percentages so that it made logical sense. But still, it suggests that students are not necessarily thinking about the question correctly.

Eriksson and Simpson (2011) (pdf) have another point of criticism with regards to the original paper. Their hypothesis is that big effect that is found using the Norton and Ariely method, is largely caused by an anchoring effect. By asking participants to describe the distribution in five percentages, that need to add up to 100, the equal division of 20% each might function like an anchor and steer answers into a more equal direction. Eriksson and Simpson run a version where they simply ask for average amounts: 'What is [should be] the average household wealth, in dollars, among the 20% richest households in the United States?' (and similarly for the other quintiles). The effect of this small change in how to phrase the question is startling. The distributions are much closer to the actual ones.

Inspired by this I decided to try and run the Eriksson and Simpson version this year in my class. I was a bit worried that by asking for amounts, instead of percentage shares, I would get lots of nonsensical answers. And my students indeed seemed to have strange ideas with regards to typical household wealth. Guesses of the total average wealth run from £3100 to £42.000.000. But if you look at the distribution of this wealth my students, on average at least, seem to have a very good grasp of the wealth inequality in the uk. The figure below compares the average distribution as guessed by the students using the Eriksson and Simpson method (left) with the actual distribution (right, same as above). And they are remarkably similar.

I guess there is no point going back to Norton and Ariely now. As suggested by Eriksson and Simpson, their finding seems pretty much driven by how the question is asked. A bit of a shame, 'students vastly underestimate how unequal the UK is' is a much more interesting starting point in the classroom than 'students pretty acurately guess how unequal the UK is'.

Thursday, December 13, 2018

Conditional Non-Takers


One of my favourite papers ever is Fischbacher, Gächter and Fehr's 2001 paper where they investigate conditional cooperation. Using a simple but clever design around the standard public good game they are able to divide their participants into a couple of different types with regard how they react to how much other people in the game cooperate. There is a not insignificant percentage of free-riders - around 30% - but a plurality of their subjects (50%) can be classified as conditional cooperators. They are willing to cooperate as long as the other players in the game also cooperate. There is another 14% of players whose behaviour can be described as 'hump-shaped' (matching cooperative behaviour when others' contributions are low but free-riding more and more as total contributions by others increase) and a small percentage of participants whose behaviour can't be easily be classified ('others').

Lately I have been working on some projects involving public good games and framing. Seeing how presenting the decision to contribute to the public good might influence the contribution rates. One often used example of this kind of framing is the difference between a Give- and a Take-frame. In the former you start with an endowment and your decision is how much of this to put in the public good. In the latter version the money starts out in the public good and your decision is how much to take out. The two situations are mathematically the same and as such shouldn't lead to different behaviours, but often do.

It made me wonder if this Give- and Take-frame difference might have an influence on the conditional cooperation as introduced by Fischbacher, Gächter and Fehr. Especially if it would change the distribution of types. Being told how much other people contributed and then being asked how much to contribute might have a different effect on behaviour than being told how much other people had taken out and then being asked how much you wanted to take out. I hadn't thought about it hard enough to have formulated a particular hypothesis about the direction this effect would work in but I did know I wanted to use the concept of 'conditional non-taking' in the title. But, as you might have guessed from the tweet quoted above, when I started to do some reading on the topic it turned out the idea had already been studied.

The main question Martinsson, Medhin and Persson study in their 2016 paper Framing and Minimum Levels in Public Good Provision is the effect of forced minimum contribution levels in Give- and Take-frames but they do so using the methods of Fischbacher, Gächter and Fehr and in their results section they investigate the conditional cooperative behaviour as well. The graph below (figure 1 from the paper) summarizes the average effect: average contributions are slightly lower for Take-decisions but there is conditional cooperation in both the Give and the Take frame. In both conditions the relationship between how much the other people in the group are contributing and how much the average participant would like to contribute is positive.

They also divide their participants into the four different behavioural types. (By the way, they run their study as a lab-in-the-field experiment in Ethiopia which makes it extra interesting). The table below is based on their Tables 6 and 7. The numbers look pretty similar for both conditions and are, according to a Chi-Square test, not statistically significantly different.

So, without doing any actual work myself, I still have an answer for the question that I had. Does presenting a public good game in a Give- or Take-frame influence the distribution of conditional cooperative types? No, not really.

Friday, August 03, 2018

Swimming With Sharks

That's a quote from Joris Luyendijk's Swimming with Sharks and a photo of that has been on my phone since I read it a couple of years ago as a reminder to maybe do something with. Testing this sounds like an interesting research idea. But somebody must have already looked at this, right?

Thursday, July 26, 2018

Are Intuitive Decisions Expected To Be More Cooperative?

This is the first in a, potentially endless, series of blogposts where I write about a research idea I came up with but of which I realised, after having read some of the related literature, that it already had been answered by someone else. Maybe not in the way I would have done it myself but close enough for me to decide not to pursue it any further and instead just dump my thoughts about it here.

There is fairly convincing evidence that intuitive decisions are often more cooperative than decisions that are the result of long deliberations. David G. Rand calls this the Social Heuristics Hypothesis (SHH).The idea is that we learn, by interacting with others, that being cooperative is generally pay-off maximizing in the long run and that we internalize this willingness to cooperate so that in situations where it isn't the best strategy, such as for instance in a one-shot interaction with a stranger, and we are making our decision intuitively, we will be more likely to cooperate. Only when we are allowed to deliberate we will see that this particular situation might be the exception to the rule.

Intuitive decision making can be forced in different ways, using cognitive load for instance, but the method that's used most often in the literature is that of time constraints; forcing participants to make their decision in a short amount of time. One of the specific predictions of the SHH is that the effect - faster decisions are more cooperative - will be found in one shot situations but not in repeated interactions where cooperating may actually be long term beneficial. And that seems indeed to be the case.

One thing I wondered after reading more and more about the SHH is whether we as a people have figured out the next step yet. As in, if intuitive, fast decisions are often more cooperative are decisions that are made faster or under time constraints expected to be more cooperative? As it turns out, I wasn't the only one who wondered this.

The title of Evans and van de Calseyde (2017) is a bit of a giveaway that they also were interested in this question, The Effects of Observed Decision Time on Expectations of Extremity and Cooperation (PDF here). They run a number of different experiments around the public good game where the participants have to guess how much a (hypothetical) other participant contributed after they were told about how fást the other player had made the decision. If participants are aware of the SHH faster decisions should be expected to be more cooperative. This is not what they find.

They can make the comparison five times on the basis of the four experiments they run. They find no effect in two instances. The opposite effect - faster decisions are actually expected to be léss cooperative - in another two. And that faster decisions are expected to be more cooperative just once. This last finding was in the one experiment where participants were also shown photos of the decision-makers they were paired with and the authors argue that this may have had an influence on their expectations.

One reason why it may be the case that Evans and van de Calseyde do not find that faster decision are expected to be more cooperative is in what they tell their participants about these decisions. As far as I understand it, they say that the person the participant is to form an expectation about chose themselves to make a faster (or slower) decision. They're simply told: the other person made their decision in X seconds and this is below/above the average. This may matter. This kind of self-paced decision time, as argued by Evans and Rand (2018), may not say a lot about how intuitive the decision was made but how easy it was to make it, which can also be interpreted as how strong the opinion of the decision maker on this topic is. This is actually consistent with the main finding, found in all four the experiments, of Evans and van de Calseijde (2017) namely that faster decisions are expected to more extreme; that contributions are further away from the middle of the available distribution.

So maybe there ís room for an experiment where participants are told that their partner was forced to make a decision in a short amount of time, instead of these self-paced decision times, and that there we will find that the (forced) faster decisions are expected to be more cooperative. I haven't found a paper yet that does that. I am not entirely sure that such a subtle difference will have a big effect.

(UPDATE: having read the Evans and van de Calseyde a bit more careful they actually do a version with externally constrained time pressure. It's one of the treatments in experiment 2. They find no effect on expectations).

On the other hand, there is already evidence that people think that faster decisions are expected to be more cooperative. Jordan, Hoffman, Nowak and Rand (2016) show that participants take into account whether their decision comes across as calculating or non-calculating if they know this will be observed (or not) by someone they will interact with next. If they know the speed of their decision will be told to the person they will subsequently play a trust game with, they will make a faster cooperative decision nów than if the speed of their decision won't be communicated.

Saturday, January 10, 2015

Perspectives on Human Cooperation Workshop

Spent all of yesterday at the anthropology department of UCL for the second edition of a workshop on Perspectives on Human Cooperation. It was a great conference. Lots of interesting talks by people from various academic disciplines - anthropologists, economists, psychologists, philosophers, behavioural biologists etc. - but also from people out there in the real world; Rory Sutherland from Ogilvy opened and Michael Sanders from the Behavioural Insights Team did the last talk. From my experimental economic perspective my favourite talks were the following:

University of Bristol's Sarah Smith analysed data from an online fundraising site - JustGiving I think it was. Here people can announce to do stuff for charity and ask friends and family to donate money. This all happens in public so people can react to or be influenced by what's happened before. Smith showed firstly that one large donation - defined as more than 2x the average - has a positive effect on later donations. Contributions after will be, on average, higher than the contributions before the large donation. She also introduced an interesting evolutionary psychological twist by looking at gender and attractiveness. This positive effect of one large donation is especially strong for male donors making donations to attractive female fundraisers.

Ruth Mace talked about some field experiments in Northern Ireland investigating in- and out-group cooperation. They asked people to donate to either protestant or catholic or neutral charities. They not only looked at whether people gave more to their 'own' group but also at the influence of the level of the sectarian tension. The higher this 'threat index' the lower the contributions to the out-group but it didn't have an effect on contributions to the in-group.

Daniel Richardson from UCL gave a fun talk about his mass participation research. One of the experiments he described was a large scale public goods game. On the basis of their behaviour they could identify four types of players: warm glow altruists (who contribute always), free riders (who contribute never), tit-for-tat-ers (who do what the rest of the group did last round) and foresight-ers (who do the opposite of what the rest of the group did last round). The majority seems to be tit-for-tat player but the foresight people are pretty important for the rest of the group because once cooperation starts to decrease after a few rounds they encourage cooperation by going against the trend and inspire the tit-for-tat-players (and, once cooperation has increased again, do pretty well financially by free-riding themselves).

Nichola Raihani's talk about third-party punishers who observe (and reward/punish the players in) some exchange and who are in turn observed by another level of bystanders who can reward their behaviour was also pretty interesting but I will have to read to actual paper first to know what was going on exactly because I missed/forgot some of the details.

Tuesday, October 15, 2013

Why Things Cost $19.95 (Reprise)

A couple of years ago I blogged about research that tried to come up with an explanation for why many prices end in '99' or '95' rather than being a round number. The study found that participants guessed the wholesale price of a plasmascreen tv to be lower if the retail price was $5000 than when it was $4.888 or $5.012. Suggesting that round numbers trigger a wider frame of reference than more exact numbers. I was reminded of this research after I read about a new study that looks at the difference between round and non-round offers in negotiation situations. This study finds that second-movers make greater counteroffer adjustments to round than to precise offers. The authors argue - and provide some evidence for this - that 'precise numerical expressions imply a greater level of knowledge than round expressions and are therefore assumed by recipients to be more informative of the true value of the good being negotiated'.

Thursday, December 20, 2012

Diederik Stapel and Benford's Law

When the Diederik Stapel scandal was first in the news last year, I thought it might be interesting to play around with his data and Benford's Law. Benford's Law is a statistical artefact that describes how the frequencies of the first digits in all sorts of large, natural datasets are not, as you'd perhaps expect, distributed evenly. There are not just as many numbers starting with '1' as there are numbers starting with '6' or '9'. In fact, the distribution follows a particular pattern with 1 being the most often observed first digit (around 30%) and 9 the least (around 5%). This finding is confirmed in all kinds of data sets and the phenomenon is occasionally used to check for fraud. The idea being that data that is made up or manipulated won't have the right distribution (Here's an example in the Economist, earlier this week).

I had seen a couple of examples of the application of Benford's Law to spot scientific fraud and I thought it could be interesting to use the Stapel case to see how it would work in experimental social science. I didn't spend too much time on it. I took Stapel's CV and started tracking down his most recent publications (at that point none of which were retracted yet). Of his publications in 2011 I managed to get hold of the following 10 via the university library:

1. Stapel, D.A., & Lindenberg, S. (2011). Coping with chaos: How disordered contexts promote stereotyping and discrimination. Science, 332,251-253.
2. Lammers, J., Stoker, J.I., Jordan, J., Pollmann, M.M.H., & Stapel, D.A. (2011). Power increases infidelity among men and women. Psychological Science, 22, 1191-1197.
3. Stapel, D.A., & Van der Linde, L.A.J.G. (2011). What drives self-affirmation effects?: On the importance of differentiating value affirmation and attribute affirmation. Journal of Personality and Social Psychology, 101, 34-45.
4. Johnson, C.S., & Stapel, D.A. (2011). Happiness as alchemy: Positive mood and responses to social comparisons. Motivation and Emotion, 35, 165-180.
5. Stapel, D.A., & Noordewier, M.K. (2011). The mental roots of system justification: System threat, need for structure, and stereotyping. Social Cognition, 29, 238-254.
6. Van Doorn, J., & Stapel D.A. (2011). When and How Beauty Sells: Priming, Conditioning, and Persuasion Processes, Journal of Consumer Research, published online June 1, 2011.
7. Lammers, J., & Stapel, D.A. (2011) Racist biases in legal decisions are reduced by a justice focus. European Journal of Social Psychology, 41, 375-387.
8. Lindenberg, S.M., Joly, J.F., & Stapel, D.A. (2011). The norm-activating power of celebrity: The dynamics of success and influence. Social Psychology Quarterly, 74, 98-120.
9. Johnson, C.S., & Stapel, D.A. (2011). Reflection vs. Self-reflection: Sources of self-esteem boost determine behavioral outcomes. Social Psychology, 42, 144-151. 72
10. Lammers, J., & Stapel, D.A. (2011). Power increases dehumanization. Group Processes & Intergroup Relations, 14, 113-126.

My way of data collection was pretty crude. I took the results section(s) of these papers (and only the results sections) and simply scored every number I encountered. The only distinction I made is that I didn't count P-values (because they were often reported inexactly, p < 0.05 etc) and number of participants (can't really remember why I chose not to include these...). Things I did count included F statistics, t values, means, SD's, path coefficients, correlation coefficients, Cronbach's alpha's etc.. I tried to avoid double counting certain numbers - numbers that were presented in a table but also referred to in the text - but I worked pretty quickly and didn't actually read the texts so I probably overlooked many instances. I ended up with a data set of 1107 numbers, had Excel extract their first digits and made a bar graph with their frequencies. It certainly didn't look like the distribution as predicted by Benford's Law but I didn't quite know what to make of it. One thing I had noticed collecting the data for instance was that Stapel used a lot of 7-point Likert scales in his 'research'. It doesn't seem unlikely that that will influence the kind of first digits used. I wasn't sure if a data set based on similar but non-fraudulent papers would actually follow Benford's Law. I thought about collection data on other, presumably non-fraudulent research but regular work got in the way and the project ended up in a drawer.

It stayed there until about a month ago when the committees installed by the three universities where Stapel worked during his career - Amsterdam, Groningen and Tilburg - presented the results of their investigation into his work. Not all of Stapel's articles were based on fabricated data. The rapport includes handy lists of all of Stapel's publications with an indication of whether the committee had 'established fraud' or not. Out of the 10 articles published in 2011 that I had collected 7 were deemed to be fraudulent (1,3,4,5,6,7,8) and the remaining 3 were apparently not (2,9 and 10). So now I could make the comparison between Stapel's real and fake data and check the distribution of the first digits in both data sets (with 186 and 921 numbers respectively).


That worked pretty well, I thought. I'm especially surprised with how well the distribution of the first digits from the non-fraudulent papers seems to follow Benford's Law (as said, only 186 digits, from 3 papers). There appear to be more 5's than predicted, but maybe that's a consequence of using all those 7-point Likert scales. The first digits from the fraudulent papers clearly don't follow Benford's Law, seemingly confirming the committees' conclusion that they were made up. This is, of course, just a very simple analysis. There are more than a few possible catches. Perhaps the data of a particular kind of research were easier to fudge than other kind of research and all the graph shows is the difference between research with, say, lots of Likert scales and other research. And I doubt there is a big future for Benford's Law in spotting scientific fraud. As soon as potential fraudsters know their results will be tested this way they can simply make up their numbers so that they fit. But, for now, I thought this was a pretty neat example of the application of Benford's Law in the experimental social sciences.