Sunday, April 3, 2011

How NOT to regress murder rates on religious belief

This post on reddit's r/atheism did a linear regression of murder rates on "importance of religion" figures (both sets of data from wikipedia).

The poster there also looked at IHDI (inequality-adjusted human development index) and its effect on the relationship.

The poster found a weak (and statistically insignificant) relationship between importance of religion and murder, but after adjusting for IHDI the sign changed (though the relationship remained weak).

But much about the analysis - and hence the conclusions is wrong or suspect.

(I'd normally have replied on reddit, but since this discussion is relatively long for a comment and involves figures, it's better written up elsewhere. Further, since this sort of analysis is the very raison d'ĂȘtre of my benighted blog, it goes here.)

While I usually work in R these days, I'm going to do the calculations for this in a spreadsheet, like the original - so that those looking at the original poster's spreadsheet can follow along.

First, I noticed that the murder rates are highly skew. Since the relationships are fairly weak, this skewness applies to both the conditional and unconditional distribution of murder-rate. This instantly invalidates all the significance-testing, so any conclusions about the significance or otherwise of the relationships goes out the window.

Second, the relationship with importance of religious belief is not monontonic, let alone linear. Any conclusions about the direction of the relationship is meaningless without taking this into account. (In what follows I am going to look at "religion is unimportant" percentages rather than "religion is important" - they mostly add to 100%, or nearly so. I do this for a particular reason, though the other figures should give similar conclusions.)

Third, some of the "religion is unimportant" figures are for countries where religious belief is compulsory or effectively so. Let's take Indonesia as an example. In Indonesia, you must choose one of a small number of religions. Lack of religious belief is not allowed. So some countries are "jammed up" against the origin, and the extremely high religious belief figures are highly suspect. Seriously, everyone in some countries thinks religion is important? Absolutely everyone? (This is one reason why for most of my analyses these days I use Wikipedia's "irreligion" figures instead, as in my previous post.)

The "jamming up against zero" issue tends to make relationships curve there, so I transformed that variable too. The usual transform with percentages is the logit transform but those few suspect "0%" figures make that impossible. I could regularize the logit transform, which usually works quite well, but in this case I just took square roots (in a previous analysis with this type of irreligion figures used here I tried a cube-root transformation, since for low percentages it spreads the figures better (it's more like a logit). With this analysis, either succeeds fairly well, but I figured the square root would be better understood.

Since pictures speak much more clearly, let's look at a picture.
I have split the unimportance of religion data into four ranges - first, high figures (in blue - there's a large gap that makes a convenient breakpoint), then medium (teal) and low (green) figues, and finally the 0% figures (red-brown) which I regard as suspect:

Click for larger image.

(I got the data from Wikipedia again myself and cleaned it a little, as there were some errors in the data that had to be fixed but which shouldn't have affected the original poster's figures.)

We see that the 0% figures are inconsistent with the trend in the low figures, and the low figures show a distinctly different pattern to the higher two groups. The upper two groups are reasonably consistent, however - we could probably use a single straight line to describe both. But on the untransformed scale for religious unimportance, there is s stronger suggestion of changing slope)

The log of murder rate is also not monotonic in IHDI though the change is less spectacular (the relationship between IHDI and "religion is unimportant" percentage is strong and close to linear over a fair portion of the range - but again, not clearly monotonic over the whole range).

All of these issues make the conclusions of the original analysis nonsense.

What can we see? the least religious countries do indeed have a lower murder rate. The question remains as to whether this effect remains after considering IHDI - but here's the final concern, though it's not a statistical issue:

Since IHDI is strongly associated with religious belief, if IHDI is substantively caused by religious belief, IHDI could be mediating the relationship between the other two variables. If religion is causative, it might be "acting through" IHDI to reduce murder rates. So we have to be cautious about concluding it isn't causative if it beccomes insignificant after adjusting for IHDI without some rather in depth analysis (and even then with heavy caveats).

I plan to do a more in depth analysis of these figures in R at some point, which will take account of the nonlinearity properly, via additive models.