Thursday, May 29, 2008

Relationships between percentages are not linear (usually)

Friendly Atheist had some graphs relating to biblical literalism and some other variables, which in turn came from several posts by razib at Gene Expression (e.g.).

I objected in comments to the fact that the relationships had been fitted by straight lines when the relationships would necessarily be curved, because of the fact that percentages are bounded by 0 and 100%. Sure enough, the curvature that we would expect to see was plainly there in some of the graphs.

Now, leaving aside a whole lot of other issues (some of which are addressed in the Friendly Atheist thread, some of which aren't), one way to make the relationships easier to see is to transform the percentages in order to stretch out the values close to the boundaries. One popular transform for proportions is the logit transform (log-odds-ratio).

Let's see its effect on the graph of the relationship between percentage of people with postgrad education and percentage who take the bible literally.

Original:


The R2 value is the square of the linear correlation between the two variables. But the relationship is strongly nonlinear! There's little value in this number.

As I said over in the Friendly Atheist comment thread:
These variables should NOT be having straight lines fitted to their relationships, unless someone really thinks percentages can go outside 0-100!

Look at the fitted “percentage with postgrad education” for the “Church of God In Christ”. It’s NEGATIVE! That makes no sense at all.

At the very least, a functional relationship that at least obeys the a priori facts about the situation (those fractions being bounded to [0,1], for example) should be used. In the first graph, the IQs also have a lower bound of 0, but we’re so many s.d.s from zero it doesn’t matter quite so much for that variable (there’s still the issue that there’s no a priori reason to expect linearity, though).

It really does matter for the percentages, because they approach their limits in this data. Notice the actual relationship from the points is curved in the second graph? That’s because the boundaries force it to be curved. Why is a straight line being fitted to a relationship that is plainly not (and worse, pretty obviously won’t be before we even see data)?

The linear equation, the R-squared and so on are all nonsense - worse than useless! (Indeed, since neither variable is necessarily thought to be causative, why use a technique - regression, whether linear or not - that treats one variable as the predictor and the other as the response?)

The original post even refers to the "Roman Catholic" point as an outlier. It isn't!. It only looks like an outlier if you're crazy enough to fit a straight line. If you look at it as a curved relationship, it fits in just fine.

Instead of using a linear correlation, we could measure a nonparametric correlation - one that measures the monotonic association between the two variables. That is, something that measures the extent to which one variable increases while the other decreases. There are many such quantities - two common ones are the Kendall measure of concordance (Kendall's tau) and the rank correlation (Spearman's rho).

Because there is a strong relationship - just not a linear one - the monotonic association is higher than the linear correlation for this data. The linear correlation is -0.86, while the Spearman measure of the monotonic association is -0.92.

However, I think the main issue is simply a better display, so let's return to the graph of the suggested transformation.

Transformed:


I did this one in a bit of a hurry, so it's a little rough, but it gives the idea. In order that we can see the relationship more clearly I have omitted the labels for the individual points, but they could be included. Notice that the "Catholic" point (the one to the right of the "10%" tickmark on the vertical axis) is clearly not an unusual point - it fits the pattern nicely.

Note that since the logit transformation is monotonic, this transformation doesn't alter the nonparametric correlations at all. So, for example, the Spearman measure of monotonic association is still -0.92.

The linear correlation on this transformed scale is, however, changed from what it was before, because now the relationship is now almost linear (it's now -0.90). I still don't advocate drawing a line on the plot, however (though a line is now a pretty good description); if a relationship must be drawn in, any of a number of standard nonparametric smoothers could be used. I think we can see the relationship just fine on the second plot.

As to whether the relationship means much of anything, that's another issue, but at least we can now clearly see it, without some distracting straight line (and equation, and r-squared value) on the graph, mis-relating the raw percentages.

I think both graphs provide valuable information - Ideally, I'd be tempted to display both, side-by-side, and since the table of data is small, to give that as well.

Update:
Here's a plot with an added smooth, done on the original raw percentage scale. This one was generated by an old version of SPlus, but R can also generate stuff like this (as can numerous other packages). The smooth here is just the default spline smooth, though the supersmoother was about as good, and the loess smoother would probably work fine if I tweaked its parameters a bit (the default is too local).



This smooth (and the others I mentioned) done on the original scale don't recognize the inherent restrictions I discuss above; I think a better way to smooth would be to transform the data (like the second plot above), smooth that and then if desired, take that smooth back to the original scale. Of course, it's no longer estimating a mean, but that's not such a huge deal - we're just trying to describe a relationship.

Further Update:
Here's a graph of what happens when you smooth on the transformed scale and transform back to the original (percentage) scale. The blue curve is the smooth curve shown above, while the more strongly bent green curve is the smooth done on the logit-logit scale and then brought back. On the transformed scale the default spline smooth was somewhat curved (though much less so than a spline smooth on the original scale), and of course, when we come back, it's definitely curved on the percentage scale.



We shouldn't extrapolate any of these relationships outside the range of the data, but at least within the range of the data, the smooth curves above are not implausible descriptions of how the variables are related to one another.

Also fixed first two links.

Monday, May 26, 2008

Who's in the lead?

Percentages don't always do what you expect

I was reading an article at Dailykos (with the ironic title "Montana's huge black population gives Obama the edge"), which is mostly debunking the myth that Obama has a problem with blue-collar whites. (I am not normally given to reading Dailykos, but I do pop in occasionally.)

At one point in the article I was reminded of Simpson's Paradox, and I thought I'd discuss it here, since it lets me talk about elections and about simple mathematics at the same time. Note that the issue doesn't occur in the story to my knowledge - it just served to remind me of it.

Imagine the following situation. Let's say you're looking at how much likely voters approve a particular candidate - let's imagine people are asked a question about exactly one candidate - whether they agree with a statement something like "the candidate would make a good president".

[NB: These numbers are completely made up! They're not real, but are just there to illustrate a point.]

Imagine we ended up with the following table (of percentages):


Percentage of people surveyed that agree that 
candidate would make a good president

McCain (R) 38
Clinton (D) 41


Note that in this case, different people are being asked about Clinton and McCain; this would not usually be the case with a typical survey. Also, these figures don't have to add to 100% - all candidates might have 80% approval, or all might have 30%.

So on these figures Clinton is slightly leading McCain in approval (maybe not outside the margin of error, but let's ignore that issue). Let's say you want to figure out whether McCain's biggest problem is with males or females. Fortunately, it turns out that this information is available.

Percentage of people surveyed that agree that 
candidate would make a good president

Men Women
McCain (R) 46 36
Clinton (D) 43 34


Hang on a minute! McCain leads Clinton on men and on women!? Did we make a mistake somewhere?

Actually, no, this is possible. Look at these numbers (counts of people):

Number of people surveyed that agree that 
candidate would make a good president

Men Women Total
McCain Approved 80 189 269
Number Asked 174 526 700

Clinton Approved 233 55 288
Number Asked 540 160 700


If you check those numbers out (unless I made a mistake somewhere), they give the percentages I quote above (to the nearest whole percent).

This is called Simpson's Paradox. It's worth knowing about if you're comparing rates or percentages.

What's going on? Well, as it turns out in our sample, women were less approving overall than men, and more women were asked about McCain than were men.

So which figure matters? Well, actually, if the proportion within each of the sub-samples are representative of the voting population, you'd need to weight each figure by the proportion in the population ... and then McCain would tend to come out looking better.

Saturday, May 24, 2008

Fame! Fortune! ... well, neither, really

... but on the latest Monday Math Madness - they used one of my solutions. (solutions not linked directly so as not to spoil it for anyone)

A chat with God

I had a chat with God like you said. He told me you suck.

Friday, May 23, 2008

My take on...

My take on Ray Comfort's banana shtick:

Jesus said of the false prophets, "By their fruits shall ye know them."

Thursday, May 22, 2008

A couple of relevant quotes.

"What kind of peace do we seek? Not a Pax Americana enforced on the world by American weapons of war. Not the peace of the grave or the security of the slave. I am talking about genuine peace, the kind of peace that makes life on earth worth living, the kind that enables men and nations to grow and to hope and to build a better life for their children - not merely peace for Americans but peace for all men and women - not merely peace in our time but peace for all time." - JFK


"Let us never negotiate out of fear. But let us never fear to negotiate." - JFK


Contrasting these sentiments with the current Republican lackwit talking points is left as an exercise for the reader.

Tuesday, May 20, 2008

Celebrating the intellect

The underlying message of the Intelligent Design movement, and most especially Expelled is "Scientists are too dumb to understand science". It is an attack on science, masquerading as an attack on elitism (and various other attacks). It is an eerie echo of the usual attack on experts by sociopolitical elites: "Don't listed to smart people. They don't understand you."

In every case, they're asking you to stop thinking.


Viva eggheads!

We need smart people. We need experts. We need people who are prepared to devote years of their lives to studying a subject. These days, human knowledge is vast, and increasingly complex.

And why is it that it takes a lot of work to become an expert these days? Why is our knowledge vast and growing? Because we build on the work of other experts. In the 1930s, vast numbers of people - many of them children - died or were crippled by polio.

Where are they now? Why don't we have thousands of crippled children? Science. Experts. Eggheads.

Smallpox used to kill thousands every year and disfigure thousands more. Where are all the new victims of smallpox?

In the early decades of last century, and all the years that preceeded it, simple infections killed millions. If it weren't for antibiotics, I'd be dead, bedridden or at best so sickly as to be barely able to work. So would many of my friends and family. If it weren't for other forms of medical treatment, almost everyone I know would be dead, or at best, suffering. One of my heroes is in the Carnival banner over on the right.

We fight an ongoing war with bacteria, which (via evolution) gradually become able to deal with antibiotics. Experts - scientists - work to keep a step ahead, saving countless lives.

Without spectacles, I'd be so blind as to be useless for most tasks (these days, I could even choose to have laser surgery).

We need to celebrate intellectualism - we need some tiny bastion against the permanent Carnival of the Stupid that those who would seek to impose their will on ours would like us to have.


What would be the ultimate consequences of this growth industry of anti-intellectualism, this mantra of whatever you want to be true is true?

Stuff like teachers who know the toothpick trick getting fired for practising wizardy. Gee, I hope something that stupid never happens.


The deadly danger of anti-knowledge rhetoric

Currently the position being held when people are decrying "elites" is in reality, anti-expert rhetoric. It's not even listening to the opinions of people who have devoted their working lives to the study of a particular subject.

Now, I'm not suggesting that they will have all the answers. But experts are experts because they know their area. It would pay to understand why they're saying what they're saying. I am not suggesting that their opinions can never be assailed. Not so; if you learn what they know, then you can take them on in their arena. Who knows, maybe you'll even come up with a good idea. But more likely, everything you think of will have already been thought of and either debunked or already tried. The experts will be able to tell you who has tried it, when and how it turned out, or they will know who debunked it, how they debunked it, and what you'll have to overcome to get it to work.

To just dismiss almost universal opinions of the knowledgeable as "elitist" is a ridiculous ploy. It's appealing to the lowest common denominator - "you're just as good as they are" is the implication. Well, that's true - human lives are all valuable. But the additional implication - "a completely ignorant opinion is just as good as any experts" is ludicrous. The hidden claim is that "smart people are stupid". Well, that is stupid.

The subtext of the dismissal of expert opinion as elitist is "believe what you want - it's just as true as reality". That's not only ludicrous, it's dangerous.

What does it get us?

  • Global Warming denialists
  • Overfishing of crucial marine resources
  • Creationism in classrooms
  • The hell-hole that Iraq has been turned into
  • The post-Katrina mess
  • Anti-vaxers playing footsie with polio... and risking everyone else's health
  • Flagrantly stupid energy policies that will lead to much worse problems down the track, like higher energy prices as the fuel runs out even faster, and no reserves to deal with temporary shortages.



To exploit this kind of anti-expert rhetoric for cheap (and temporary) advantage... that deserves our most vehement contempt. Politicians - at least ones worthy of our vote - should be leading by example, be encouraging us to be smart, be discussing difficult issues and trusting us to realize that they're not going to be solved by platitudes and ignorant policies. They should not be exploiting ignorance, and we should not be rewarding them for treating us like morons. They should be denounced, loudly. If the media won't do it, then we must.

Beware the wealthy and influential masquerading as populists. They're liars, and their favourite lie is to slander someone smart and hardworking with the label "elite". Don't trust them, because they think you're stupid and can be exploited if they just push a few buttons.

God of the non-gaps

Evolved and Rational's latest post is another delightful snark, with which I am nevertheless going to take issue.

Perhaps uncharacteristically, Evolved is being far too kind this time, I'm afraid.

She says: "Behe's theology is one where god resides in the gaps of human knowledge [...] his criteria for detecting design is merely what has not been explained by science at the time. "

I think the first part of that quote is what Behe was aiming for. Unfortunately, he missed - Behe's examples of gaps, his cases of "irreducible complexity", his examples of non-evolution, are wrong.

On HIV, Behe was wrong. He was thoroughly spanked by erv (erv is now at scienceblogs), pursued at Panda's Thumb (e.g. ), and eventually forced to admit that he was wrong. A sucky admission, but still, he finally noticed he'd been pwned.
On the bacterial flagellum, Behe was wrong. Epically wrong.
On the immune system and much else besides, Behe was wrong

These are just examples, there are lots of others.

I'm not aware of a single instance where Behe managed to pick an actual legitimate gap.

[It's not like it's hard. As the cryptotheist evolution-denialists are so fond of pointing out, just like building a drystone wall, every time a gap is filled in, there are more (but much tinier) gaps around it - so these days there are plenty of little gaps to go around, if you have even the slightest clue about where to look. Of course the big problem for the cryptotheist is that to actually find the genuine gaps you have to know enough science to be able to read a few papers - that is, to display a basic competence one might expect from a beginning science postgrad.]

While trying to construct a God-of-the-gaps (a trivial exercise for even the barely competent), Behe - doubtless inspired by God - constructed a God-of-the-non-gaps, a God-of-the-reducibly-complex, a God-of-the-already-explained-by-science.

Behe's work is a masterpiece of epic fail. Perhaps God is trying to tell you something, Michael. Like how much you suck at science. (As Abbie pointed out, you also suck at Google.)

(I won't even bother with Dembski. His work is more incompetent, contentless and pathetic than that of Behe. And he's an unrepentant - and incompetent - thief. Why pussyfoot about, let's just go the whole hog. Dembski just generally sucks.)

Friday, May 16, 2008

Sex and Drugs and guys who wear Prada with their dresses...

The Pope recently came out and says that sex can be like a drug.

I find it kind of amusing, to be honest, but I guess I should be used to it by now - religious leaders talking at length about things of which they presumably have no experience, as though they knew exactly what they were talking about (what God thinks about playing video games, what God thinks about pollution, what God thinks about who should be President...).

The Pope's statement brings to mind a vision of some really great parties at the Vatican involving snorting cocaine off trays on the heads of dwarves while they fiddle with... but you know, like the parties they're always having. Apparently.

But rather than a long diatribe, I'm going to leave you with a little bit of Blackadder, from season 1 (the season where Edmund Blackadder and Percy are in the last days of pre-Tudor England). Richard Curtis says it better than I could.

Percy: You know, they do say that the Infanta’s eyes are more beautiful than the famous Stone of Galveston.
Blackadder: Mm? ... What?
Percy: The famous Stone of Galveston, My Lord.
Blackadder: And what’s that, exactly?
Percy: “Well, it’s a famous blue stone, and it comes... from Galveston.”
Blackadder: “I see. And what about it?”
Percy: “Well, My Lord, the Infanta’s eyes are bluer than it, for a start.”
Blackadder: “I see. And have you ever seen this stone?”
Percy: “No, not as such, My Lord, but I know a couple of people who have, and they say it’s very, very blue indeed.”
Blackadder: “And have these people seen the Infanta’s eyes?”
Percy: “No, I shouldn’t think so, My Lord.”
Blackadder: “And neither have you, presumably.”
Percy: “No, My Lord.”
Blackadder: “So, what you’re telling me, Percy, is that something you have never seen is slightly less blue than something else... you have never seen.”

Thursday, May 15, 2008

Possible banner for the Carnival of Elitist Bastards

Since I have an anti-"anti-elitism" post set for the Carnival of Elitist Bastards (and this too), I thought I'd have a go at a banner. I stole one of the banners here and modified it. Because, you know, a bastard elitist egghead like me won't admit that there's something he couldn't do if he wanted.

So, anyway, it's a bit rough, but here we go:



(Edit: I just noticed a minor issue, which I am not going to fix. So there.)

US elections

So Edwards has declared for Obama, and polls show both Obama and Clinton with substantial leads over McCain (47-40 for Obama, 46-41 for McCain). If you look at the rate superdelegates are declaring for Obama, Clinton could win every race 70-30 (or even higher) and she would still be falling behind in total delegates faster than before, and looking at the poll results, one of her main arguments for remaining in the race (that she would do better in the race against McCain) doesn't hold water.

To be honest, I think a Clinton presidency would, in many respects, be a reasonably good one. I don't think that's enough. A curate's egg presidency? - when there's another choice?

Of course that's just my opinion - it's not like I can vote. I got my chance at that months ago.

Tuesday, May 13, 2008

Selecting a random uniform positive integer

This post is edited together from some comments arising out of comments at Pharyngula, where some commenters were discussing randomly selecting a positive integer (with equal probability on every positive integer), in a discussion of creationists abuse of mathematics.

The problem is, you can't do it.

It is simply not possible to actually select an integer at random where each one has the same probability of being chosen.

In order to be able to construct something which could have a uniform distribution over a countably infinite set, you'd have to drop Kolmogorov's axiom of countable additivity.

Such a thing is possible to do, but when people deal with countably infinite sample spaces they do use the axiom, so whatever "probability" you end up with by dropping it will not be like the probability that people use.

(In measure theory terms, it would be put something like: The union of a countable number of measurable sets cannot have measure 1 with all the sets having equal measure.)

What does that mean? It means you either have to give up probabilities adding to 1, or you have to give up equal probability on your countably infinite set.

A probabilist called de Finetti (and facist, as it happens, but that has nothing to do with whether his work was any good) claimed it was possible to have a notion that was effectively a uniform distribution over a countably infinite set by dropping the aforementioned axiom.

That is, in effect, he dropped the "probabilities add to 1" part that would otherwise be a consequence of the axiom. In effect, he said each probability was zero. The uncomfortable consequence is that the probability that some number is chosen is not 1, but 0.

Other probabilists have tried making each probability > 0. The uncomfortable consequence there is that of course, the probability that some number is chosen becomes infinite.

No actual mechanism for producing a random number from these somewhat unintuitive constructs exists. Any process that gives you a number will be inevitably weighted toward the small numbers (give small numbers generally higher probability than sufficiently large numbers).

Now, if you're only using such a construct to represent something like a state of knowledge**, and not actually requiring to be able to actually observe a value from it, it is sometimes possible to work with such things. That's the sort of thing that's effectively being done when Bayesians work with flat priors over the positive integers (they refer to things like that as "improper priors") - with appropriate normalization, you can get proper posteriors to come out the other end.

** Some people regard the uniform over the positive integers as representing a complete lack of knowledge, but to my mind it actually represents the dramatically strong assumption that no matter how large a value you consider, you effectively assume that essentially all values are larger than it.

A fair number of people object to this unsual kind of construct, insisting that subjective probability and actual realizable probability ought to have the same features. They also like to point to the paradoxes that come along with not doing so, with which I am not presently sufficiently familiar to discuss right now.

Anyway, the short version of all that is "You can't actually select an element from a countably infinite set (such as the positive integers, all the integers, the rationals, or the differences of the square roots of non-negative integers) with equal probability on all elements."

Indeed, it's stronger than that. For example, not only does the probability need to generally decrease as the values become arbitrarily large, the probability can't decrease very slowly. For example, it can't even decrease at a rate proportional to 1/k -- though a finite number of values need not be required to follow the general decrease.

Sunday, May 11, 2008

Got another one

I don't usually have enough time to invest a few hours in solving the Monday Math Madness puzzles on Wild About Math and Blinkdagger, but twice so far (out of the five puzzles posted) I was able to see how to do the puzzle essentially as soon as I read it, and so sent in an entry.

As with the first time, my entry was again correct, but still no banana, unfortunately. (It was essentially to prove that 1 and 9 were the only squares with no even digits.)

I'm there in the list of people giving correct answers, under the longer version of my nom-de-blog.

Saturday, May 3, 2008

How to be worse than useless

"I'll pray for you."

It's the ultimate cop out.

"I'm going to do nothing to help you, but saying it outright would make me feel like crap. So you know what I'm going to do? I'm going to pretend to talk to an invisible magic sky-daddy so I can feel like I'm doing something. And after you deal with your problems all on your own, I get to claim credit! Isn't life grand?"

This is stupid even by twisted theo-logic - if the invisible magic sky-daddy really knows everything, and has the power to change things, he left things the way they are for a reason. What are you praying for? Are you smarter than him or not? If his amazing and mysterious plans are so amazing and mysterious, you better stop fucking with the grand plan right now, bozo.

Or are you appealing to his compassion? If he so lacks compassion that he won't lift a finger to help until you pray, he's a protection-racketeer. "An offer you can't refuse". Godfather indeed.


Atheists don't have the luxury of the cop-out. Atheists either help someone or they have to deal with the fact that they didn't help. Guess what - a lot of atheists do help people. To be sure, some of us don't, but we don't go around pretending we did, and we don't go around claiming credit if things turned out okay anyway.

Prayer is a great de-motivator. "I've prayed, it's in God's hand's now."
Now there's a call for inaction!

Prayer, in some circumstances, causes harm. Prayer can make people feel miserable and inadequate - if it fails, it must be your fault, and if you are brave and strong in the face of terrible odds - well you don't get to claim credit, sorry.

Atheists know there's nobody but us, and if we want things to be better, we have to do something.

National day of prayer? What a fucked-in-the-head idea.

Friday, May 2, 2008

the symbolic and the practical

Well, the new Australian government seems to be step-by-step and piece-by-piece dealing with almost everything about the previous government that made me ashamed.

The latest is the removal of discrimination against gay&lesbian couples in legislation, by giving them the same entitlements as hetero couples (the pieces of legislation in question already don't discriminate on marital status, so this part is relatively straightforward). They're not going to legalize same-sex marriage, sadly, but at least the enshrined financial discrimination by government is coming to an end.

In a way, I can accept that for the moment - I'd rather see this partial improvement get through than a more completely equitable move fail. It may be that, like the call for a republic, it's a fight for a later time, possibly even a later government. I'm still sad for the gay and lesbian couples that want to get married now.

Apparently the government is also about to deliver a $21B (AUD) budget surplus, nearly 1.75% of GDP. That's huge, more than a thousand dollars per capita. Assuming they're still funding the election promises, I worry about what's being cut, but at least it should help then deal with our current schizo economy which can't decide between the internal inflationary effects of full employment and heavy consumer spending or the market downturn brought on by the US financial crisis, since it gives them the breathing space to fairly rapidly move from the anti-inflation measure it's designed to be, to spending more to help out the economy should the economy turn extremely sour later.

They're doing better economically than the arch-conservative government they replaced, but recent Labor governments have been bigger economic reformers than the supposed conservatives anyway, while still advancing a more equitable social agenda, a less gung-ho foreign policy and more responsible environmental policies. So far, so good.

But I'm still worried about where the $21B surplus came from. I guess I'll find out soon, when the budget is released.