New paper: Galaxy Zoo and machine learning
I’m really happy to announce a new paper based on Galaxy Zoo data has just been accepted for publication. This one is different than many of our previous works; it focuses on the science of machine learning, and how we’re improving the ability of computers to identify galaxy morphologies after being trained off the classifications you’ve provided in Galaxy Zoo. This paper was led by Sander Dieleman, a PhD student at Ghent University in Belgium.
This work was begun in early 2014, when we ran an online competition through the Kaggle data platform called “The Galaxy Challenge”. The premise was fairly simple – we used the classifications provided by citizen scientists for the Galaxy Zoo 2 project and challenged computer scientists to write an algorithm to match those classifications as closely as possible. We provided about 75,000 anonymized images + classifications as a training set for participants, and kept the same amount of data secret; solutions submitted by competitors were tested on this set. More than 300 teams participated, and we awarded prizes to the top three scores. You can see more details on the competition site.
Since completing the competition, Sander has been working on writing up his solution as an academic paper, which has just been accepted to Monthly Notices of the Royal Astronomical Society (MNRAS). The method he’s developed relies on a technique known as a neural network; these are sets of algorithms (or statistical models) in which the parameters being fit can change as they learn, and can model “non-linear” relationships between the inputs. The name and design of many neural networks are inspired by similarities to the way that neurons function in the brain.
One of the innovative techniques in Sander’s work has been to use a model that makes use of the symmetry in the galaxy images. Consider the pictures of the same galaxy below:
From the classifications in GZ, we’d expect the answers for these two images to be identical; it’s the same galaxy, after all, no matter which way we look at it. For a computer program, however, these images would need to be separately analyzed and classified. Sander’s work exploits this in two ways:
- The size of the training data can be dramatically increased by including multiple, rotated versions of the different images. More training data typically results in a better-performing algorithm.
- Since the morphological classification for the two galaxies should be the same, we can apply the same feature detectors to the rotated images and thus share parameters in the model. This makes the model more general and improves the overall performance.
Once all of the training data is in, Sander’s model takes images and can provide very precise classifications of morphology. I think one of the neatest visualizations is this one: galaxies along the top vs bottom rows are considered “most dis-similar” by the maps in the model. You can see that it’s doing well by, for example, grouping all the loose spiral galaxies together and predicting that these are a distinct class from edge-on spirals.

From Figure 13 in Dieleman et al. (2015). Example sets of images that are maximally distinct in the prediction model. The top row consists of loose winding spirals, while the bottom row are edge-on disks.
For more details on Sander’s work, he has an excellent blog post on his own site that goes into many of the details, a lot of which is accessible even to a non-expert.
While there are a lot of applications for these sorts of algorithms, we’re particularly interested in how this will help us select future datasets for Galaxy Zoo and similar projects. For future surveys like LSST, which will contain many millions of images, we want to efficiently select the images where citizen scientists can contribute the most – either for their unusualness or for the possibility of more serendipitous discoveries. Your data are what make innovations like this possible, and we’re looking forward to seeing how these can be applied to new scientific problems.
New Images on Galaxy Zoo, Part 1
We’re delighted to announce that we have some new images on Galaxy Zoo for you to classify! There are two sets of new images:
1. Galaxies from the CANDELS survey
2. Galaxies from the GOODS survey
The general look of these images should be quite familiar to our regular classifiers, and we’ve already described them in many previous posts (examples: here, here, and here), so they may not need too much explanation. The only difference for these new images are their sensitivities: the GOODS images are made from more HST orbits and are deeper, so you should be able to better see details in a larger number of galaxies compared to HST.

Comparison of the different sets of images from the GOODS survey taken with the Hubble Space Telescope. The left shows shallower images from GZH with only 2 sets of exposures; the right shows the new, deeper images with 5 sets of exposures now being classified.
The new CANDELS images, however, are slightly shallower than before. The main reason that these are being included is to help us get data measuring the effect of brightness and imaging depth for your crowdsourced classifications. While they aren’t always as visually stunning as nearby SDSS or HST images, getting accurate data is really crucial for the science we want to do on high-redshift objects, and so we hope you’ll give the new images your best efforts.

Images from the CANDELS survey with the Hubble Space Telescope. Left: deeper 5-epoch images already classified in GZ. Right: the shallower 2-epoch images now being classified.
Both of these datasets are relatively small compared to the full Sloan Digital Sky Survey (SDSS) and Hubble Space Telescope (HST) sets that users have helped us with over the last several years. With about 13,000 total images, we hope that they’ll can be finished by the Galaxy Zoo community within a couple months. We already have more sets of data prepared for as soon as these finish – stay tuned for Part 2 coming up shortly!
As always, thanks to everyone for their help – please ask the scientists or moderators here or on Talk if you have any questions!
Hubble science results on Voorwerpjes – episode 1
After two rounds of comments and questions from the journal referee, the first paper discussing the detailed results of the Hubble observations of the giant ionized clouds we’ve come to call Voorwerpjes has been accepted for publication in the Astronomical Journal. (In the meantime, and freely accessible, the final accepted version is available at http://arxiv.org/abs/1408.5159 ) We pretty much always complain about the refereeing process, but this time the referee did prod us into putting a couple of broad statements on much more quantitively supported bases. Trying to be complete on the properties of the host galaxies of these nuclei and on the origin of the ionized gas, the paper runs to about 35 pages, so I’ll just hit some main points here.
These are all in interacting galaxies, including merger remnants. This holds as well for possibly all the “parent” sample including AGN which are clearly powerful enough to light up the surrounding gas. Signs include tidal tails of star as well as gas, and dust lanes which are chaotic and twisted. These twists can be modeled one the assumption that they started in the orbital plane of a former (now assimilated) companion galaxy, which gives merger ages around 1.5 billion years for the two galaxies where there are large enough dust lanes to use this approach. In 6 of 8 galaxies we studied, the central bulge is dominant – one is an S0 with large bulge, and only one is a mostly normal barred spiral (with a tidal tail).<?p>
Incorporating spectroscopic information on both internal Doppler shifts and chemical makeup of the gas we can start to distinguish smaller areas affected by outflow from the active nuclei and the larger surrounding regions where the gas is in orderly orbits around the galaxies (as in tidal tails). We have especially powerful synergy by adding complete velocity maps made by Alexei Moiseev using the 6-meter Russian telescope (BTA). In undisturbed tidal tails, the abundances of heavy elements are typically half or less of what we see in the Sun, while in material transported outward from the nuclei, these fractions may be above what the solar reference level. There is a broad match between disturbed motions indicating outward flows and heavy-element fractions. (By “transported” above, I meant “blasted outwards at hundreds of kilometers per second”). Seeing only a minor role for these outflows puts our sample in contrast to the extended gas around some quasars with strong radio sources, which is dominated by gas blasted out at thousands of kilometers per second. We’re seeing either a different process or a different stage in its development (one which we pretty much didn’t know about before following up this set of Galaxy Zoo finds.) We looked for evidence of recent star formation in these galaxies, using both the emission-line data to look for H-alpha emission from such regions and seeking bright star clusters. Unlike Hanny’s Voorwerp, we see only the most marginal evidence that these galaxies in general trigger starbirth with their outflows. Sometimes the Universe plays tricks. One detail we learned from our new spectra and the mid-infared data from NASA’s WISE survey satellite is that giant Voorwerpje UGC 7342 has been photobombed. A galaxy that originally looked as if it night be an interacting companion is in fact a background starburst galaxy, whose infrared emission was blended with that from the AGN in longer-wavelength IR data. So that means the “real” second galaxy has already merged, and the AGN luminosity has dropped more than we first thought. (The background galaxy has in the meantime also been observed by SDSS, and can be found in DR12).
Now we’re on to polishing the next paper analyzing this rich data set, moving on to what some colleagues find more interesting – what the gas properties are telling us about the last 100,000 years of history of these nuclei, and how their radiation correlates (or indeed anti-correlates) with material being blasted outward into the galaxy from the nucleus. Once again, stay tuned!
Radio Galaxy Zoo searches for Hybrid Morphology Radio galaxies (HyMoRS): #hybrid
First science paper on hybrid morphology radio galaxies found through Radio Galaxy Zoo project has now been submitted!
In the paper we have revised the definition of the hybrid morphology radio galaxy (HyMoRS or hybrids) class. In general, HyMoRS show different Fanaroff-Riley radio morphology on either side of the active nucleus, that is FRI type on one side and FRII on the other side of their infrared host galaxy. But we found that this wasn’t very precise, and set up a clear definition of these sources, which is:
”To minimise the misclassification of HyMoRS, we attempt to tighten the original morphological classification of radio galaxies in the scope of detailed observational and analytical/numerical studies undertaken in the past 30 years. We consider a radio source to be a HyMoRS only if
(i) it has a well-defined hotspot on one side and a clear FR I type jet on the other, though we note the hotspots may `flicker’, that is their brightness may be rapidly variable (Saxton et al. 2002), and, in the case the radio source has a very prominent core or is highly asymmetric,
(ii) its core prominence does not suggest strong relativistic beaming nor its asymmetric radio structure can be explained by differential light travel time effects. ”
Based on this we revised hybrids reported in scientific literature and found 18 objects that satisfy our criteria. With Radio Galaxy Zoo during the first year of its operation, through our fantastic RadioTalk, you guys now nearly doubled this number finding another 14 hybrids, which we now confirm! Two examples from the paper are below:

We also looked at the mid-infrared colours of hybrids’ hosts. As explained by Ivy in our last RGZ blog post (https://blog.galaxyzoo.org/2015/03/02/first-radio-galaxy-zoo-paper-has-been-submitted/), the mid-infrared colour space is defined by the WISE filter bands: W1, W2 and W3, corresponding to 3.4, 4.6 and 12 microns, respectively.
The results are below:
For those of you interested in seeing the full paper, we will post a link to freely accessible copy once the paper is accepted by the journal and is in press! 🙂
Fantastic job everyone!
Anna & the RGZ science team
First Radio Galaxy Zoo paper has been submitted!
The project description and early science paper (results from Year 1) for the Radio Galaxy Zoo project has been submitted!
We find that the RGZ citizen scientists are as effective as the science experts at identifying the radio sources and their host galaxies.
Based upon our results from 1 year of operation, we find the RGZ host galaxies reside in 3 primary loci of mid-infrared colour space. The mid-infrared colour space is defined by the WISE filter bands: W1, W2 and W3, corresponding to 3.4, 4.6 and 12 microns; respectively.
Approximately 10% of the RGZ sample reside in the mid-IR colour space dominated by elliptical galaxies, which have older stellar populations and are less dusty, hence resulting in bluer (W2-W3) colours. The 2nd locus (where ~15% of RGZ sources are found) lies in the colour space known as the `AGN wedge’, typically associated with X-ray-bright QSOs and Seyferts. And lastly, the largest concentration of RGZ host galaxies (~30%) can be found in the 3rd locus usually associated with luminous infrared galaxies (LIRGs). It should be noted that only a small fraction of LIRGs are associated with late-stage mergers. The remainder of the RGZ host population are distributed along the loci of both star-forming and active galaxies, indicative of radio emission from star-forming galaxies and/or dusty elliptical (non-star-forming) galaxies. See the figure below for a plot of these results.
Caption to figure: WISE colour-colour diagram, showing sources from the WISE all-sky catalog (colourmap), 33,127 sources from the 75% RGZ catalog (black contours), and powerful radio galaxies (green points) from (Gürkan et al. 2014). The wedge used to identify IR colours of X-ray-bright AGN from Lacy et al. (2004) & Mateos et al. (2012) is overplotted (red dashes). Only 10% of the WISE all-sky sources have colours in the X-ray bright AGN wedge; this is contrasted with 40% of RGZ and 49% of the Gürkan et al. (2014) radio galaxies. The remaining RGZ sources have WISE colours consistent with distinct populations of elliptical galaxies and LIRGs, with smaller numbers of spiral galaxies and starbursts.
In addition, we will also be submitting our paper on Hybrid Morphology Radio Sources (HyMoRS) in the next few days so stay tuned!
As always, thank you all very much for all your help and support and keep up the awesome work!
Cheers,
Julie, Ivy & the RGZ science team









