In the eye of the beholder?
Hey guys and girls,
So, as you probably know, the last month or so of Galaxy Zoo has been dedicated to testing whether we have any bias in our classifications (and if you want to know why we are interested in looking at the rotation of galaxies then please have a read here). By ‘bias’ we basically mean some systematic error in the way people classify (you can get a good explanation in Jordan’s post), and this is different from just random general scatter of results. For example, we know that when a galaxy is faint or small then people are more likely to think it is an elliptical galaxy – and this particular mophology bias is something that Steven must compensate for in his work.
It has been really exciting to work on the rotation classifications of Galaxy Zoo, and as many of you know early on in the project we realised that people were classifying more galaxies as anti-clockwise (see the Telegraph article for example). Specifically, if we take those galaxies that are well classified (ie. more than 80% of people agree) then we find we have an anti-clockwise:clockwise ratio of about 52:48. This may not sound particularly significant, but as you increase the number of galaxies that you have in your sample (as more of you lovely people classify for us) then this ratio becomes more significant, and is highly unlikely for the ~35,000 galaxies that we have. [For those of you who like probability, the number of anti-clockwise galaxies that we expect is distributed according to a Binomial probability distribution. And if we assume that the ratio is really 50:50, then out of a total of N galaxies we expect N/2 to be anti-clockwise, with a standard deviation of sqrt(N/4).]
In the plot below we show the relative excess of clockwise votes (for users that classified more than about 300 galaxies) – this is the number of clockwise votes minus the number of anti-clockwise, votes divided by the sum of the two. For example, this number would be 1 if a user always clicks clockwise, and zero if they click both clockwise and anti-clockwise equally.
This graph confirms that everyone is generally clicking anti-clockwise more often, because we see that the mean tends to lie below the zero line. But this plot cannot distinguish between intrinsic excess of anti-clockwise galaxies on the sky or human bias, and it is obviously very important for our rotation results that we get a handle on this as we could not announce our possible anti-clockwise excess result to the scientific community without doing these bias checks. So the basic idea is to look at the votes for a galaxy before and after a galaxy image is flipped. For example, if 6 out of 10 people thought it was originally clockwise, then after flippping we expect about 6 out of 10 people to now think it is rotating anti-clockwise (if there is no rotation bias).
Since the end of November many of the images in Galaxy Zoo have been flipped for this purpose (and we’ve been monitoring the status here), and we now think that we have enough data to measure the levels of bias. This week Anze has flown over from Berkeley (in California) especially to crunch the numbers with Kate (in Oxford); it is quite a job – with over 7 million classifications to go through! And during our analysis some rather subtle points arose… as with most science, things don’t go exactly to plan!
So we basically wanted to compare the classifications for a galaxy before and after flipping, but we quickly realised that peoples behaviour in the last month or so is very different to the earlier datasets (see Anze’s post for an explanation of how we reduce the data). For example, recently people have been more likely to click the ‘Star/Don’t know’ button. This might be because we have lots of new users, recruited through our latest publicity drive. Or maybe lots of old members have come back after receiving the newsletters. Either way it meant we couldn’t simply compare before and after votes. Also, annoyingly, the original unflipped images are no longer on the site and so getting a handle on this behaviour change was a bit tricky (note that one of the first rules of scientific experiements is to have a control test, but accidentally a miscommunication amoungst team members meant that in this case our control sample got left out!). Fortunately though, we are able to use the monochrome images that are currently in the site to compare to (as we observe that being in black and white does not change how people choose between anti-clockwise and clockwise).
So we want to know what the average votes per button are, for the average galaxy in Galaxy Zoo. This is where we encountered our second problem – our bias sample does not cover all of the Galaxy Zoo galaxies, but just 10% of them, and this 10% was not selected at random. In particular we know that we have more anti-clockwise galaxies in the bias sample (on the site at the moment). Therefore we needed to careful undo what we did when we selected this subsample, so to then construct an effectively random subsample of our full database. Then we could look at the average weights.
In the figure we show the average fraction of votes that a galaxy gets for clockwise (class=2) and anti-clockwise (class=3). We show the result for the original classifications in black (before December), for the monchrome images in red, and for the flipped images in green. We also show the 1 standard deviation errorbars from sampling.
So what we see is that the class=3 points are always higher than the class=2 points, and crucially this is true even after we flip a galaxy image! Looking at the red points, we find that before flipping there is a 6.0% chance of hitting anti-clock and 5.5% of hitting clock for our sample. Then after flipping (green) there is a 5.9% of hitting anti-clock and 5.6% of hitting clock. So the point is that those numbers stay the same (within 1 standard deviation) when they should actually reverse if there is no bias. It is easier to think in terms of the ratio of fractions:
anti/(anti+clock)=0.522 before flipping
anti/(anti+clock)=0.512 after flipping.
And if we had:
a) no bias and no excess then these should both be 0.5.
b) no bias and a real excess then one should be the opposite of the other (ie. 0.52 and then 0.48)
c) a bias and no excess then we would expect them to stay the same and not equal 0.5.
But what we actually find is that 0.522 is 5 standard deviations away from 0.5, 0.512 is 3 standard deviations away from 0.5, and 0.522 & 0.512 are within 1 standard deviation of each other. So you see we appear to be convincingly in situation (c).
So what next? Well – it is fantastic that we have been able to get a handle on the bias even if it did turn out to be effecting our results. Only with Galaxy Zoo which has so many contributors were we able to detect the bias (and it may turn out to be an inherent bias in the way people see galaxies, which an interesting psychology result). Without so many classifications the excess result would have always remained uncertain. And while we no longer think we have an overall excess of anti-clockwise galaxies (which we never expected in the first place!) we can still do a lot of interesting work and pursue our original scientific aims, as explained here and here.
Thanks guys! And keep up the good work. Current classifications remain useful, and we hope to give you some more images next week (possibly returning to the full catalogue!).
Cheers, Kate & Anze
28 responses to “In the eye of the beholder?”
Trackbacks / Pingbacks
- May 20, 2015 -
- February 17, 2016 -
It is to bad things didn’t go how you planned, BUT I do applaud you for being highly creative and finding a way to complete such an important study.
I do wonder, and i might have missed this somewhere. How much thought went into the Interface design for the website, and couldn’t that cause the bias? I feel in my case there are a couple errors i make that i feel comes down to the interface design for the website.
Thank you for such an amazing project to help out with.
We will be considering the site design carefully for GZ2. For example, making the buttons more separate so it is harder to go to click one and accidentally click another. The classifications will also be more detailed, with multiple questions, so single wrong clicks will be less of an issue.
I’m having a problem with the reversed images. I often click through to magnify the image to check what I’m seeing. For the reversed images, the orientation switches back when one clicks through the Galaxy Ref. That then raises the problem of classifying the direction of spiral.
For example, this image is reversed from the main classification page. I called it a clockwise spiral, since that’s the direction it shows in the magnified image, though it’s reversed on the main page.
I should add that I couldn’t tell if there was interaction between the large spiral and the smaller galaxy just below and to the left of center (or right of center, depending on which image one views).
Sorry if it isn’t clear, but in you should always classify the galaxies as they appear in the Galaxy Zoo page. Don’t worry about it, though – we can cope with a few mix ups.
The front page of galaxyzoo.org says that you should “… continue to classify the Galaxy Zoo image as normal (and not use the SDSS one)”. As the post above explains, we have been reversing the images for a reason!
Thanks RBH, and everyone else, for your efforts!
Regarding the SDK button being clicked more often. I know I certainly changed my modus operandi, as I was just going with what I could see on the GZ page, as requested, and generally not analysing in more depth by using the SDSS pages. Many images are indistinct and to avoid misclassifying I felt it better to SDK than distort my results with a guess. It certainly helped with speed of analysis, and I felt less pressure to get it defined one way or the other.
However, I was better satisfied when I felt I was contributing more than just a bias check, so I look forward to GZ2 with the possibility of finding something new or odd, so I can satisfy my yen and contribute my little bit to science, and learn more also.
I echo Cameron’s sentiment regarding thanks for letting us in on a fascinating project.
Okey dokey. But you’d still be ahead to consult with someone who knows something about human perception and the design of interfaces in classification tasks.
Just as a side note, the inability of single-layer neural nets to accurately classify spiral stimuli was one reason for their near-abandonment in research on modeling human perception through the 1970s into the 1980s, before backpropagation was invented for multi-layer neural nets. Interestingly, humans share that inability. Hence the desirability of consulting someone who knows something about human perception at a professional level.
I have wondered about the interface as a possible source of bias as well. Is there any way that the “neutral” button could be in the middle for any set of choices, i.e. the EO/unclear button between the CW and ACW buttons, and Star/DK button between the spiral and ellipse.
What would have been fascinating – were it possible; I doubt it is in GZ – is a breakdown by various backgrounds.
Learning to read an analog clock is so very basic in western acculturation, and of course a clock’s hands move… clockwise. Imagining from its image which direction a galaxy is rotating in is “more difficult” if the arms sweep out counter-clockwise.
There could be a handedness issue; in a clockwise spiral image an arm starts at the core and sweeps out and away from it to the left if the arm is above the core, and the converse if below. The appearance is reversed in counter-clockwise cases. This left/right, up/down handedness and symmetry aspect becomes potentially interesting if there isin fact evident bias in selections.
Gender-based issues? – indirectly perhaps, via handedness? If that was the case, say, then a bias in the gender of survey respondants would be reflected here perhaps.
Just some thoughts. 🙂
I would like to add some comments about my experience working
with the bias study set. I did the classifying in groups of 105 usually. (100 plus 5 to cover any miss counts on my part of keeping track of where I was at). You said that there was a higher number for star/dk. My experience was that I saw between 2 and 5 percent stars satellite tracks and just bad
exposures. And another 1/2 to 1 percent that I had a true dk.
(a large fuzzy blob that took up most of the picture or other shape that I could not fit into one of the other 5 categories.)
This 1/2 to 1 percent represents a true change on my part.
The two to 5 percent star excedera seems to be higher then with the full set of images. Also it seemed to me that I saw the same stars, tracks and bad exposures again and again. Which brings upthe question: Was the random sample presented to me from the bias data set really random?
If that question doesn’t give you nightmares, I’ve got more.
Over the last 2 to 3 wks I have done a from a couple hundred to a high of 3 thousand per day to get you in a position to start the number crunching. I decided it was time to stop for a while when I found myself thinking “how did I classify that one the last time I saw it?”
I believe I classified a lot more edge/dk,in the bias study
then the original. Also a significant number of these were of the variety of “I know it’s a spiral(concentrated central location, structure/striation) but I can’t tell which way it’s turning.) I did very few of these during the original.
This may have been more of a drive on my part to put them into the CW or CCW buckets with the first go at them or not.
I suggest if possible to look at images that changed classification from phase I to Phase II.
Thanks for al the comments. I’ll try to answer a few specific points:
‘How much thought went into the Interface design for the website, and couldn’t that cause the bias?’
Oh, the wisdom of hindsight! Indeed we now realise that we didn’t put as much thought into it as perhaps we should have. But originally we were not expecting an ‘anti-clockwise excess’ result, and so never thought that we were going to have to test these subtle biases. I am pretty sure that the site design will completely explain the result – people are clicking the AC button just a few percent more, and from users comments it seems this is just because it is in the middle. As Michelle points out – the neutral button should probably be in the middle.
‘However, I was better satisfied when I felt I was contributing more than just a bias check, so I look forward to GZ2 with the possibility of finding something new or odd, so I can satisfy my yen and contribute my little bit to science, and learn more also.’
We appreciate this sentiment, and understand that all you guys signed up for an Astronomy project, and not a psychology one! Therefore we’ll try to move quickly on GZII.
‘Okey dokey. But you’d still be ahead to consult with someone who knows something about human perception and the design of interfaces in classification tasks’
Good point – for GZ II we will indeed consult a little wider. And the team now has a lot more experience too – aware of GZI weak points!
‘Gender-based issues? – indirectly perhaps, via handedness? ‘
If it isn’t because of the buttons, then indeed it’d be fascinating to see if there is a correlation with country, right/left handedness, gender, etc. But this wouldn’t be a project for GZ, as you’d want to design the experiment very differently (and probably not be an astrophysicist!).
‘Also it seemed to me that I saw the same stars, tracks and bad exposures again and again. Which brings upthe question: Was the random sample presented to me from the bias data set really random?’
No no – that is what we said in our post about the second problem we had: the bias sample WASN’T random. We know that -there were in fact more S/dK’s in it. It was also a lot smaller sample, so it makes sense the chance of repeats was higher (sorry about that). But to analyse the bias results we then take an effectively random selection of the not random sample! And yes – it does give us nightmares!
I assume that this has to do something with our reading and writing from left to right.
You should check, if there are participants, who are used to read from right to left and if they differ in their classification results.
Another method would be, to give lets say about 1000 to 5000 spiral galaxies as a sample for re classification.
The sequence should be randomly distributed. If a user makes a clear mistake, then it should be given to him several more times for re classification after some time.
Users who deviate too much should be taken out, assuming, that there are some special causes, which lead them to the wrong classification.
I’ve noticed on the forum that there seems to be some feeling that this result in some way mars the whole GZ project! So I’d just like to point out a couple of things:
Firstly – we still have lots of other cosmology (this strange excess thing was never part of our original cosmology motivation, see here for example). We can still look for an ‘axis’ about which galaxies rotate, as well as examine structure formation theories by looking at neighbour-neighbour correlations (as mentioned here).
Secondly – the cosmology side of GZ was an after thought anyway, it is really the morphology classifications (rather than rotation) that this project set out to do (ie. spiral Vs elliptical), and that is all still completely on track!! Indeed – with your help we have been able to classify the morphology of a million galaxies! And these results haven’t been written about too much on the blog yet…
Think of this ‘anti-clockwise excess’ as a rather bizarre temporary diversion from the original project… which is doing both astronomy and cosmology for sure!
I can’t resist disagreeing with Kate, here. Of course, we’d need to check to be sure but if she’s right when she says
“I am pretty sure that the site design will completely explain the result – people are clicking the AC button just a few percent more, and from users comments it seems this is just because it is in the middle.”
then I don’t see how this could have explained the original Longo results. Of course, they could have just been coincidence but still…
How does NGC4622 fit into this whole CW/ACW discussion?
Story on MSNBC here:
And additional info on the Hubble news center at:
There may exist several causes:
1. just human mistakes, like touching the wrong button.
This should equal out, if it happens randomly only.
2. There are at least 2 different kind kinds of photos of spiral galaxies:
a. the very well defined, which in general are easy to classify, with one exception:
Some of them are not parallel to the computer screen, but have an angle in space. And that may cause some problems with the classification.
b. the poorly defined, faint spiral galaxies. For them there could be a probability, that they are judged randomly right or left rotating.
To clarify both problems, two other buttons should be added:
– faint spiral galaxy
– difficulty oriented spiral galaxy
or if this looks to be too difficulty, just a button, with which one can give a judgement, how sure one is, to classify it correctly. ( 1,2 3 or something like that)
That should help to improve the statitics, because then these cases could be taken out and looked at separetly.
There exist also some peculiar differences in human behwiour:
Some people exchange always right and left and this has nothing to do with intelligence.
This one could find out, if somebody deviates significantly from the average, but is consistent in his own reference frame.
I want to suggest anyway, to add some more buttons:
Now and then there can be seen very interesting structures of galaxies or exotic looking photos, for which no classification button exists.
By adding some more buttons, it would be possible to investigate these special cases. Otherwise they will get lost.
I think the bias study provided important, interesting knowledge. Prior to the bias study, we didn’t know for sure if the distribution of clockwise and anticlockwise galaxies was random or not. Now we know for sure that it is random, and current cosmology models do not need to be revised (at least not yet…). I think that all of us who are classifying galaxies can be proud that we provided the data used to make this discovery! I’m also looking forward to new discoveries that may yet be made with the Galaxy Zoo data.
I am a Cognitive Neuroscience PhD student in Britain, and this anti-clockwise bias has really caught my attention (even though affordance is not my research area). For what it is worth, I am convinced that human bias is the cause (as do the zookeepers). My argument is based on the following:
Although spiraling anti-clockwise, the appearance of such spiral arms can easily be considered to be visually clockwise, if you trace the arms outwards from the center. Research has shown that right handers are faster and more error free when tracing a clockwise motion with their right hand than left-handers, who are better at tracing anti-clockwise motion (with their left hand). This shows a preference in right handers (majority of population) for clockwise motion. Coupled with this, performing mental rotation judgments is significantly faster if the rotation is required to go clockwise than if it goes anti-clockwise, again showing a preference for clockwise transitions. Now here is the rub: stimuli that afford fluid action such as this are later given preferential status/treatment over and above stimuli that does not afford fluid action. This has been studied by consumer psychologists, and as a product of this research, in adverts you will never see the “actor” handle the marketed item awkwardly, as the viewer unconsciously perceives this non-fluid action and the item will be liked less by the consumer. Also, in situations of uncertainty we naturally rely on our preferences and prior experience to guide us towards a decision. Taken together, it is possible that as the majority of “Zoo’ers” will be right handed, they will have a natural inclination towards clockwise-looking arms (even though they are spiralling anti-clockwise) as they will afford fluid perception (as discussed). This is likely to result in preferential treatment of clockwise spiral arms (anti-clockwise galaxies) leading to faster reaction times and less errors. Critically, regarding the bias, when a user is uncertain of the direction of motion, it may be that [i]their preferential visual-clockwise tendencies steer them towards an anti-clockwise decision[/i]. This reliance on preferences is the basis of all marketing, and is analogous to you going to a chiller in a new town to buy a drink, and there being 20 cans from a company you have never heard of before, and a Coca-Cola. I bet you will grab the coke! In the galaxy zoo, during uncertainty perhaps an anti-clockwise selection is that coke.
Of course, this is all conjecture, as there is surprisingly very little research done on clockwise/anti-clockwise affordance in cognition. Even if you disagree, I hope this has found some interest among the zoo.
All the best, and thanks for a great site!
Thanks for your interest, Jim! 🙂
So (no, folks, I don’t know . . .), black and white have no effect on our perception of clockwise/anticlockwise. I’d be interested to know how they affect perceptions of star/don’t know versus elliptical, though, as I am sure they affected mine.
I’m left-handed and a visual learner, for what that’s worth to Jim Grange 🙂
There’s no need for me to trace the galactic arms in either direction. I believe my bias– if any– toward classifying ACW versus other results is due to my seeing an “S” shape in the images. The less clear the image, the more a tendency to look for an S-shape should affect my selection.
Also for what it’s worth, I have noticed that some of the B/W images can be easier to classify. Colors tend to blend and disguise details.
Surely there could be a forth case (d) There is a bias and an excess. There are two sub cases bias >> excess and excess >> bias.
With excess >> bias then one would be less than 0.5 and the other above but not symmetrical – this is obviously not the case.
With bias >> excess then the two results would be symmetrical about a non 0.5 bias point. This could be the case, only knowing the statistics can this be clearly separated from case (c).
Waveney: bias + excess is a possibility too. Although your two subcases are essentially cases (b) and (c) – if one effect is much greater than another (>>) then with limited statistics that is observationally equivalent to one effect not being present. The possibility of bias and excess being of comparable magnitude is the interesting additional case. Then flipping would change the fractions of CW and ACW spirals, but not so much to make flipped CW equal unflipped ACW, and vice versa. There is a tiny hint of this in the plot – but it is not significant given the uncertainties. We can still put limits on how small the excess really is. However, it is clear that the bias is the dominant source of the original signal, i.e. we are very close to case (c).
Okay, so you say that there is an apparent bias in the smaller sampling (10% of the overall sample), is that enough to extrapolate the other 90% of the sample as being equally biased?
If so, you guys also state that we’ve “helped categorize a million galaxies” or some such. But, if the results are biased, how much confidence can you have in the categorization in which we helped, if we’re biased.
For my part, I tried to make a good-faith guesstimate of clock vs. anti-clock based upon how I perceived the arm directions compared to the arm directions on the buttons. And I clicked “star / don’t know” a lot ’cause I really didn’t know. Especially on a lot of the slightly nebulous looking small round ones that weren’t especially “elliptical,” but rather looked like circular or spherical blobs. I don’t think that behavior changed all that much after I knew bias testing was going on. I still tried to identify clock and anti-clock from the GZ images first and foremost. Though, I suppose I may have been more reticent to identify images I wasn’t 100% sure were spirals or ellipticals. So, maybe I did click s/DK slightly more on the non-100% sure ones.
Do people get served the same image of the same object (or different images of the same object) on some regular schedule, such that results from the same person on the same object can be compared for consistency? I’d think that might root out a little bit of error / bias as well? IE, where someone identifies the exact same image in the exact same way on multiple subsequent passes could give a higher confidence in the veracity of the identification. Whereas, if someone identifies the same image differently on multiple passes, then perhaps it can be deduced that either the image is unclear, or that the person is unreliable as a morphology identifier.
I’m trying to recall where there’s a statistic on the site for how good of agreement your result have at any given time with the results of the whole? IE, if you generally click the same thing as everyone else on given images, perhaps you would get a higher “like minds” score on that image, or something.
It would be interesting to see it on maybe particular images and/or groups of images (do I generally agree with the majority who have rated images agreed to be clockwise spirals; do I generally agree with people who have rated specific images as “merger”; etc.)…
Hope that made sense.
When I look at a normal right handed screw thread on a bolt,
its pitch never looks as great as a left handed thread,
which has the same number threads per inch/cm.
try flipping an image of screw thread, and compare it with the original image, side by side. does not the left handed thread version, seem to be much more distinct?
I’m assuming this is a normal effect, that most people perceive.
I thought it may be because we are normally exposed to right handed threads in this world.
and are very used to “seeing” them.
and that left handed threads are not so common (rare even!),
and hence “stick out” becoming more visually obvious!
A perception bias.
Could this effect also be seen in spirals, to a lesser extent?
And explain the bias in the results, that you are seeing?
I often work from a laptop, and if I rest my hand just right on the keyboard near the touch pad, it makes clicking decisions for me. I found a few times the laptop was choosing whether something was elliptical or whether something was clockwise vs. anticlockwise instead of me. So I wonder how technical challenges like this may have influenced your results.
Uhhh.. just so you know… since it is harder to tell which way a galaxy is rotating with the black and white images… I’ve been going to the SDSS site to get the colour ones and clicking whatever way it is rotating from there.
And as for more ellipticals… sometimes it is obvious it’s a galaxy but edge-on it is too hard to to tell whether it is elliptical… In that case I usually classify reddish/yellowish ones as elliptical and bluish ones as spiral. Also if it is an elliptical shape but, say, one side is a bit redder I will say elliptical, even though maybe that means it is a spiral?
Also I think I’m more likely to assume an indiscrete blob is a galaxy and not a star… so I click elliptical instead of “Star/Don’t know.” I don’t click spiral because I can’t see any arms.
The giant “elliptical” button is pretty much my default.