Galaxy Zoo CANDELS
We submitted the Galaxy Zoo CANDELS paper in May. Now, after some discussion with a very helpful referee, the paper is accepted! I hope our volunteers are as thrilled as I was to get the news. It happened within days of the Galaxy Zoo: Hubble paper acceptance. Hurray!

Spot the typo! (No, just kidding.) (Well, sort of. There is one, but it’s not easy to find and it’s pretty inconsequential.) This is not quite the longest paper I’ve ever written, but it is the longest author list I’ve ever been at the top of. It includes both Galaxy Zoo and CANDELS scientists. And the volunteers are acknowledged too, in that first footnote. A lot of people did a lot of work to bring this together.
If you’d like to read the paper, it’s publicly available as a pre-print now and will be published at some point soon in the Monthly Notices of the Royal Astronomical Society. The pre-print version is the accepted version, so it should only differ from the eventual published paper by a tiny bit (I’m sure the proof editor will catch some typos and so on).
The paper may be a little long for a casual read, so here’s an overview:
- We collected 2,149,206 classifications of 52,073 subjects, from 41,552 registered volunteers and 53,714 web browser sessions where the classifier didn’t log in. In the analysis we assumed each of those unique browser sessions was a separate volunteer.

Most subjects have 40 classifications apiece, although some were retired early from active classification and others were classified further, until about 80 volunteers per galaxy had told us what they thought.
- The raw consensus classifications are definitely useful, but we also weighted the classifications using a combination of “gold standard” data and consensus-based weighting. That is, classifiers were up- or down-weighted according to whether they could tell a galaxy apart from a star most of the time, and then the rest of the weighting proceeded in the same way it has for every other GZ dataset. No surprise: the majority of volunteers are excellent classifiers.
- 6% of the raw classifications were from 86 classifiers who both classified a lot and gave the same answer (usually “star or artifact”) at least 98% of the time, no matter what images they saw. We have some bots, but they’re quite easy to spot.
- Even with a pretty generous definition of what counts as “featured”, less than 15% of galaxies in the relatively young Universe that this data examines have clear signs of features. Most galaxies in the data set are relatively smooth and featureless.
- Galaxy Zoo compares well with visual classifications of the same galaxies done by members of the CANDELS team, despite the fact that the comparison is sometimes hard because the questions they asked weren’t the same as what we did. This is, of course, a classic problem when comparing data sets of any kind: to some extent it’s always apples-vs-oranges, and the devil is in the details.

We devote an entire section of the paper to comparing with the CANDELS-team classifications (from Kartaltepe et al. 2015, which we abbreviate to K15 in the paper). The bottom line: the classifications generally agree, and where they don’t we understand why. Sometimes it’s because there’s interesting science there, like mergers versus overlaps. The greyscale shading is a 2-D histogram; the difference in the blue versus red points is in which axis was used to separate the galaxy into bins so that the average trends could be computed.
- By combining Galaxy Zoo classifications with multi-wavelength light profile fitting — where we fit a 2D equation to the distribution of light in a galaxy, the properties of which correlate pretty well with whether a galaxy has a strong disk component — we’ve identified a population of likely disk-dominated galaxies that also completely lack the features that are common in disk galaxies in the nearby, more evolved Universe. These disks don’t have spiral arms, they don’t have bars, they don’t have clumps. They’re smooth, but they are disks, not ellipticals. They tend to be a bit more compact than disk galaxies that do have features, even though they’re at the same luminosities. They’re also hard to identify using color alone (which echoes what we’ve seen in past Galaxy Zoo studies of various different kinds of galaxies). You really need both kinds of morphological information to reliably find these.
- The data is available for download for those who would like to study it: data.galaxyzoo.org.
With the data releases of Galaxy Zoo: Hubble and Galaxy Zoo CANDELS added to the existing Galaxy Zoo releases, your combined classifications of over a million galaxies near and far are now public. We’ve already done some science together with these classifications, but there’s so much more to do. Thanks again for enabling us to learn about the Universe. This wouldn’t have been possible without you.
Clicking 10 Billion Years Into The Past
Astronomers use funny units. We have the light-year, which sounds like a time but is actually a distance. There’s the parsec, a historical (but still used) unit of distance that was famously mis-used as a time in Star Wars. And then there’s redshift, which is actually a velocity — distance divided by time — but which, because of the expansion of the universe, astronomers get to use as a proxy for distance.
While it may be convenient for us to use distance units where we set a mind-blowingly large number equal to 1, it doesn’t really help us communicate our work to the public. If I note that the galaxy images from CANDELS look a little different from the galaxies in the SDSS because the CANDELS galaxies are typically at a redshift of 2, that’s pretty meaningless. But it’s a little different to think of the fact that, when you classify a galaxy from CANDELS, you may be looking three-quarters of the way to the edge of the visible universe, and seeing the galaxy as it was 10 billion years ago.
During this hangout, we announced that your clicks and classifications of the CANDELS galaxies have been moving at such an impressive rate that the first round is finished. Every galaxy has enough classifications for us to get a very good sense of what its morphology is. It may be that, for some of the galaxies where there are clearly more details to flush out, we will ask for a few more classifications per galaxy. And there will probably be future CANDELS images from survey fields that are still being completed. So, don’t worry, there will still be plenty of opportunities to classify galaxies as they were 10 billion years ago!
In the meantime, though, we’re getting ready not just to do the scientific analysis, but to share Galaxy Zoo results with our colleagues around the world. The summer conference season is upon us, and many of us have given and are giving talks and posters at various meetings in various cities. This includes not just the recent meeting highlighting the importance of galaxy morphology in the era of large surveys at the Royal Astronomical Society and the upcoming ZooCon in Oxford and Galaxy Zoo meeting in Sydney, but also several more general conferences, including the 222nd American Astronomical Society meeting and the upcoming UK National Astronomy Meeting. Spreading the word about the scientific results we’re finding with Galaxy Zoo is one of the most important parts of our job — and it doesn’t hurt that in order to do that we have to visit some very interesting places. During the hangout we chatted a bit about that and also took some of your questions:
Note: although it was a beautiful sunny day in Oxford, the variable audio quality is not because I was occasionally distracted looking out the window. I don’t think it was the new microphone, either. We’ll look into it, but in the meantime I’ve tried to equalize the podcast version with some after-editing, so hopefully that is slightly better.




