As some of you may have noticed, our first paper has caught the eye of New Scientist (in fact they have written about us before). This is pretty cool considering that we effectively had a null result in our paper – concluding that the Universe seems to be relatively normal. The real excitement is due to the ‘people power’ that the Zoo has harnessed.
I think you need a subscription to see the full article, and however much I’d love to give out Chris’ account details I am sure this breaks certain rules! However, if you have been able to take a look at the full piece then you may be a little curious about the comments from Prof Michael Longo towards the end
It turned out people have a preference when picking orientation: despite the mirroring, 52 per cent of the galaxies were still described as anticlockwise. “Rather than the universe being odd, it might be that people are odd,” says Land. The team has submitted the findings to Monthly Notices of the Royal Astronomical Society (www.arxiv.org/0803.3247).
Longo, however, is unconvinced. The mirroring analysis was only carried out for 5 per cent of the galaxies studied and he believes this sample is too small to justify rejecting the original excess that users spotted, which corroborated the existence of the axis. “[Land and colleagues] have done an impressive job of organising the Galaxy Zoo project, but I believe their analysis is flawed,” he says.
It is really thanks to him that this part of the Zoo, looking into the spins of galaxies as opposed to just the morphology, took place (see here for more on the motivation of this part of the study). And he raises an interesting point in the article that I thought it’d be worth responding to…
The point was raised that because we only did the bias study on about 5% of the Galaxy Zoo sample then we cannot really comment at all on the level of bias in our full dataset. Indeed this sounds quite reasonable. We can see how the classifications for this random ~5% of the data behave when we flip the images, but how do we know that the full sample wouldn’t behave differently?
Well, I think there are two important points to be aware of. Firstly, with statistics we were able to confidently detect a bias in the classifications of these ~50,000 galaxies. The analysis we performed is discussed in some detail on this blog. It is a bit technical, but not only do we detect a bias effect, but with a method called resampling we further established the uncertainty in this result – the probability that the effect could just appear by chance.
For this the data was split into further subsets, and by looking at how the results varied between these groups we could estimate the overall uncertainty in our results. For example – if it turns out that removing a few pieces of information causes the results to vary wildly then this means that you have a huge uncertainty and cannot make strong final conclusions about the full dataset. In our case this method actually returned relatively small errors because even between subsets of the data the results did not vary much. When we formally computed the uncertainty (we used the jackknife method to be specific) we are able to detect the bias at the ‘3-sigma level’.
This kind of lingo is used a lot by scientists, and what we mean by ‘sigma’ is one standard deviation. This is a measure of how much numbers can be expected to vary by chance. Consider for example that you toss a fair coin a thousand times, and you want to know how many times you can expect to return a heads. Well obviously you’d expect 500 heads – but not necessarily exactly 500 as there will be some natural variance in the results. In this example it actually turns out that the number of heads you expect roughly obeys a Normal Distribution with mean of 500, and standard deviation of ~16. This means that if we repeated the experiment a number of times we expect 68.3% of the results to find the number of heads to be with 1 standard deviation of 500 (between 484-516), 95.4% within ‘2 sigma’ (468-532), and 99.7% within ‘3 sigma’ (452-548). If your experiment returned 450 heads from 1,000 tosses of the coin, then this would be unexpected at the ‘3-sigma level’ and would be highly unlikely – thus indicating that the dice is probably biased.
Well, similarly we found that our original and our flipped classifications were inconsistent at more than 3 standard deviations – and this means we can be sure at the 99.7% level than there is a bias effect in our study. This is what we mean by confidently!
But what about the full dataset? Well this is the second point – the bias-study galaxies were selected completely at random from the full dataset in order to get a representative sample of them. We have conclusively shown that there is a bias in the way people classify galaxies and hence the same effect should be present in the full sample. We cannot be 100% sure that the full sample would show exactly the same bias effect, but we can be over 99.73% sure (3 sigma) sure. In other words, for the bias effect to be a statistical fluctuation due to reanalyzing just 5% of the data, we would have to be very lucky (not quite LOTTO lucky, but more than BINGO lucky!). But once the bias effect is taken into account, the axis (or more specifically the excess of anti-clockwise galaxies) disappears. Alas!
My statistical analysis skills are such that I have to take the word of professionals on the figures regarding bias, but the process described is such that it seems reasonable to share your confidence in the results. One point however, still intrigues me, and that is the cause of this observer bias.
As one not as steeped in the deeper concepts of Cosmology, the idea of the universe being slightly unbalanced did not seem as jarring a concept as the idea of consistent observer bias, but it would seem that my perception of both the Universe & People are in need of revision.
I realise that this is rather crossing into a different field, but it would be fascinating to see an examination onto just why & how this bias happens.
Send him a link to here. Seems like us he needed some refreshing reminders on statistics. Oh and I love the picture of you on your website.
Statistics has a logic of its own that is way beyond my ken. That said, simply from my own experience of GZ I know there was a lot of room for error. I frequently hit the wrong button. Initially I went back to correct these mis-hits until I found out corrections didn’t register on the system. Also, I suspect some members may have tried to read spirality into images that were too blurred to safely evaluate. I tried to avoid this by avoiding categorisation unless I could clearly see the pattern. If I am right this leaves a specific msytery for further research – NOT why observers register one particular direction of spiral in preference to another, but why, when faced with a spirality which is on the borderline of observability, observers then favour a particular guess as to its direction. Testable I would have thought.
“…thus indicating that the dice is probably biased”
I thought that you were tossing a coin !!
“…thus indicating that the dice is probably biased”
And “dice” is a plural so either “the die is” or “the dice are”.
Sorry to seem pedantic !
Sounds like it’s time to bring in a developmental psych researcher to figure out why we prefer to see anti-clockwise spirals…
Yes. If you know anyone, let us know!
Or what about whether it’s just that anticlock is the middle button. In future, perhaps 50% of newbies could get them swapped around?
The statistics do appear impressive, but I remain uneasy about the methodological flaws in the bias study: the original, unadjusted images remaining available, which when combined with unclear instructions, resulted in some GZooers looking at the unadjusted image when they made their type selection.
I look forward to GZoo2 when this matter can be revisited with the benefit of the experiences to date.