I’d love to be able to take every galaxy and say something about it’s morphology. The more galaxies we label, the more specific questions we can answer. When you want to know what fraction of low-mass barred spiral galaxies host AGN, suddenly it really matters that you have a lot of labelled galaxies to divide up.
But there’s a problem: humans don’t scale. Surveys keep getting bigger, but we will always have the same number of volunteers (applying order-of-magnitude astronomer math).
We’re struggling to keep pace now. When EUCLID (2022), LSST (2023) and WFIRST (2025ish) come online, we’ll start to look silly.
To keep up, Galaxy Zoo needs an automatic classifier. Other researchers have used responses that we’ve already collected from volunteers to train classifiers. The best performing of these are convolutional neural networks (CNNs) – a type of deep learning model tailored for image recognition. But CNNs have a drawback. They don’t easily handle uncertainty.
When learning, they implicitly assume that all labels are equally confident – which is definitely not the case for Galaxy Zoo (more in the section below). And when making (regression) predictions, they only give a ‘best guess’ answer with no error bars.
In our paper, we use Bayesian CNNs for morphology classification. Our Bayesian CNNs provide two key improvements:
- They account for varying uncertainty when learning from volunteer responses
- They predict full posteriors over the morphology of each galaxy
Using our Bayesian CNN, we can learn from noisy labels and make reliable predictions (with error bars) for hundreds of millions of galaxies.
How Bayesian Convolutional Neural Networks Work
There’s two key steps to creating Bayesian CNNs.
1. Predict the parameters of a probability distribution, not the label itself
Training neural networks is much like any other fitting problem: you tweak the model to match the observations. If all the labels are equally uncertain, you can just minimise the difference between your predictions and the observed values. But for Galaxy Zoo, many labels are more confident than others. If I observe that, for some galaxy, 30% of volunteers say “barred”, my confidence in that 30% massively depends on how many people replied – was it 4 or 40?
Instead, we predict the probability that a typical volunteer will say “Bar”, and minimise how surprised we should be given the total number of volunteers who replied. This way, our model understands that errors on galaxies where many volunteers replied are worse than errors on galaxies where few volunteers replied – letting it learn from every galaxy.
2. Use Dropout to Pretend to Train Many Networks
Our model now makes probabilistic predictions. But what if we had trained a different model? It would make slightly different probabilistic predictions. We need to marginalise over the possible models we might have trained. To do this, we use dropout. Dropout turns off many random neurons in our model, permuting our network into a new one each time we make predictions.
Below, you can see our Bayesian CNN in action. Each row is a galaxy (shown to the left). In the central column, our CNN makes a single probabilistic prediction (the probability that a typical volunteer would say “Bar”). We can interpret that as a posterior for the probability that k of N volunteers would say “Bar” – shown in black. On the right, we marginalise over many CNN using dropout. Each CNN posterior (grey) is different, but we can marginalise over them to get the posterior over many CNN (green) – our Bayesian prediction.
Read more about it in the paper.
Modern surveys will image hundreds of millions of galaxies – more than we can show to volunteers. Given that, which galaxies should we classify with volunteers, and which by our Bayesian CNN?
Ideally we would only show volunteers the images that the model would find most informative. The model should be able to ask – hey, these galaxies would be really helpful to learn from– can you label them for me please? Then the humans would label them and the model would retrain. This is active learning.
In our experiments, applying active learning reduces the number of galaxies needed to reach a given performance level by up to 35-60% (See the paper).
We can use our posteriors to work out which galaxies are most informative. Remember that we use dropout to approximate training many models (see above). We show in the paper that informative galaxies are galaxies where those models confidently disagree.
This is only possible because we think about labels probabilistically and approximate training many models.
What galaxies are informative? Exactly the galaxies you would intuitively expect.
- The model strongly prefers diverse featured galaxies over ellipticals
- For identifying bars, the model prefers galaxies which are better resolved (lower redshift)
This selection is completely automatic. Indeed, I didn’t realise the lower redshift preference until I looked at the images!
I’m excited to see what science can be done as we move from morphology catalogs of hundreds of thousands of galaxies to hundreds of millions. If you’d like to know more or you have any questions, get in touch in the comments or on Twitter (@mike_w_ai, @chrislintott, @yaringal).
Excited to join in? Click here to go to Galaxy Zoo and start classifying! What could you discover?
Congratulations Radio Galaxy Zoo citizen scientists on a job well done! The Radio Galaxy Zoo 1 project has now finished with ~2.29 million classifications! Well done on helping us push towards the finish line.
We have at least two second-generation Radio Galaxy Zoo projects in the pipeline for which we hope to launch next. Therefore please stay tuned for the announcement of the Radio Galaxy Zoo 2 projects where we will be presenting you with new data from the next-generation radio telescopes.
Thank you very much again for all your support and we will continue to keep you updated on our progress in the interim.
Ivy & Stas
Here is a bittersweet announcement that the current first-generation Radio Galaxy Zoo project will be retiring on the 1st May 2019. We are so grateful to have worked with such a productive team of citizen and professional scientists for the past 5.5 years.
To-date, we have made over 2.27 million classifications and published 10 refereed journal articles. We have another 1 submitted and another to be submitted in the next few weeks.
Looking towards the future, we are of course in the process of developing the next-generation of Radio Galaxy Zoo projects. For that, we ask that you stay tune for our future announcements of the suite of Radio Galaxy Zoo 2 projects that we are planning to launch. Of course, we will be keeping you all informed about our latest RGZ-based follow-up observations (e.g. the Zoo Gems programme with the Hubble Space Telescope). Therefore, this is not the last message from us.
To cap-off this impending retirement, I propose that we make a final RGZ sprint to the finish in the remaining days April 2019 –that is, let’s all try to classify as many sources as we can in the next few weeks!
Thank you very much again and let’s all make a concerted push to the finish line!
Ivy & Stas
The following blogpost is from Avery Garon who led the publication of Radio Galaxy Zoo’s latest science result. Congratulations to Avery and team!
Radio Galaxy Zoo is starting the new year strong, with another paper just accepted for publication. “Radio Galaxy Zoo: The Distortion of Radio Galaxies by Galaxy Clusters” will appear soon in The Astronomical Journal and is available now as a pre-print on the arXiv: https://arxiv.org/abs/1901.05480. This paper was led by University of Minnesota graduate student Avery Garon and investigates several ways in which the shape of a galaxy’s radio emission is affected by and informs us about the environment in which we find the galaxy.
Like the previous RGZ paper, we are looking for how the radio tails extend into the hot plasma that fills galaxy clusters (the intracluster medium, or ICM). This time, we measure how much the two tails deviate from a straight line, marked in the example below by the value θ. The standard model is that the ICM exerts ram pressure on the galaxy as it travels though the cluster and causes its tails to bend away from the direction of motion. However, while individual clusters have been studied in great detail, no one has had a large enough sample of radio galaxies to statistically validate this model. Thanks to RGZ, we were able to observe the effect of ram pressure as a trend for the bending angle θ to increase for galaxies closer to the center of clusters (where the ICM density is higher) and in higher mass clusters (where the galaxies orbit with higher speeds).
Because ram pressure causes the tails to bend away from the direction in which the galaxy is travelling, we can use this knowledge to map out the kinds of orbits that these galaxies are on. Unlike planetary orbits, which are nearly circular and all in the same plane, the orbits of galaxies in clusters tend to be randomly distributed in orientation and eccentricity. Our sample of bent radio galaxies shows an even more striking result: they are preferentially found in highly radial orbits that plunge through the center of their clusters, which suggests that they are being bent as their orbits take them through the dense central regions.
Finally, we looked at radio galaxies that were far from clusters. Even though the median bending angle is 0° away from clusters, there is still a small fraction of highly bent galaxies out there. By counting the number of optical galaxies that are near the radio galaxies, we observed a sharp increase in the number of companions within a few hundred kiloparsecs of our bent radio galaxies. This suggests that even outside of true cluster environments, we are still observing bending induced by local overdensities in the intergalactic medium.
Happy 5th birthday to Radio Galaxy Zoo!
We have now completed 84% of the project and reached 2.24 million classifications (the equivalent of ~90.2 years of work) thanks to all the hard work from our Radio Galaxy Zooites. So much has happened in the world of Radio Galaxy Zoo this year and many of the new scientific results we reported cannot have happened without your help.
In 2018, we had 4 papers accepted for publication in the Monthly Notices of the Royal Astronomical Society, doubling the number of papers that Radio Galaxy Zoo previously published. In addition, we have three more Radio Galaxy Zoo papers that have been submitted this year and are currently undergoing the refereeing process.
As always, our science papers can be freely-accessed and so I encourage you all to check out the following papers if you are interested. Here is the list of papers published this year:
1) Radio Galaxy Zoo: compact and extended radio source classification with deep learning by Vesna Lukic et al
2) Radio Galaxy Zoo: machine learning for radio source host galaxy cross-identification by Matthew Alger et al
3) Radio Galaxy Zoo: CLARAN – a deep learning classifier for radio morphologies
by Chen Wu et al
4) Radio Galaxy Zoo: observational evidence for environment as the cause of radio source asymmetry by Payton Rodman et al
As we summarise the main events this year, it would be remiss of me to not mention the retirement of our previous co-Primary Investigator (co-PI) as well as original driver of this project, Dr Julie Banfield, without whom Radio Galaxy Zoo wouldn’t be where it is today. We continue to be very grateful for her hard work and support. Finally, I would like to thank Dr Stas Shabala for agreeing to be a co-PI on this project after Julie’s departure for greener pastures.
Thank you all very much again for all your help and we shall continue to report on the science that is made possible thanks to you all. Keep up the awesome work! We hope that you all have a happy end-of-2018 and an excellent 2019.
Ivy & Stas
One of the most enduring serendipitous finds of the original Galaxy Zoo was a category of giant gas clouds shining from the energy input of active galactic nuclei (AGN) which have since faded (being a little cavalier here with time and verb tenses, since we can’t get news faster than light travels). The most famous of the is of course Hanny’s Voorwerp, whose discovery led to subprojects which turned up many more (“Voorwerpjes”). We have new results now on a related project going back to the Galaxy Zoo Forum, where we searched for gas in companions to active galaxies which is ionized by the AGN, and therefore gives us one more way to learn about how bright the AGN was tens of thousands of years before our direct view. Read More…
Radio Galaxy Zoo: what radio lobe shapes tell us about the mutual impact of jets and intergalactic gas
The following blogpost is from Stas Shabala about the Radio Galaxy Zoo paper led by his student, Payton Rodman, exploring the origin of asymmetries observed in a sample of Radio Galaxy Zoo radio galaxies.
Another Radio Galaxy Zoo paper has just been accepted for publication. “Radio Galaxy Zoo: Observational evidence for environment as the cause of radio source asymmetry” will shortly appear in Monthly Notices of the Royal Astronomical Society, and is already available on the preprint server (https://arxiv.org/abs/1811.03726). This paper, led by University of Tasmania undergraduate student Payton Rodman, looks at the properties of lobes in powerful radio galaxies. These lobes are inflated by a pair of jets, emerging in opposite directions from the accretion disk of the black hole at the centre of their host galaxy. Astronomers have known for a while that how big, bright or wide the radio lobes are depends on the properties of the intergalactic gas into which these lobes expand. Small, slow-growing lobes are usually found in galaxy clusters, while their large, rapidly expanding cousins tend to stay away from such dense environments. Radio lobes move about and heat intergalactic gas, and in this way they are thought to be responsible for regulating the formation of stars (by staving off the gravitational collapse of cold gas) in massive galaxies over the last eight billion years. Because of this, understanding how jets and lobes interact with their surroundings is important for understanding the history of the Universe. What complicates matters is that the mechanisms responsible for feeding the black hole and generating jets are also different in these two environments. So does nature or nurture determine what the lobes look like?
We decided to use the fact that all radio galaxies start out with two intrinsically identical jets propagating in opposite directions. If the two resultant lobes look different, this could only be due to the interaction with the surrounding gas – in other words, nurture. To test the nurture hypothesis, we used the first tranche of Radio Galaxy Zoo classifications. We selected all sources classified by citizen scientists to contain two clear radio lobes, and subjected this sample to a number of rigorous cuts on brightness, shape, redshift, and availability of environment information. Hot intergalactic gas is usually traced by X-ray observations, but these are unavailable for the majority of the sample. Instead, we used the clustering of optical galaxies from the Sloan Digital Sky Survey, which should be a good proxy for the underlying gas distribution. Then, for each radio galaxy, we compared the properties of the two radio lobes to how many galaxies were found near each of the lobes. We found a clear anti-correlation between the length of the radio lobe, and the number of nearby galaxies – in other words, shorter lobes have more galaxies surrounding them. These results were in excellent agreement with quantitative predictions from models (such as this hydrodynamic simulation made on the University of Tasmania’s supercomputer by PhD student Patrick Yates), which show that it is more difficult for lobes to expand into dense environments. The relationship between the luminosity of the lobes and galaxy clustering was much less clear, again consistent with models which predict a highly non-linear luminosity evolution as the lobes grow.
The excellent agreement between models and observations suggests that it is nurture, not nature, which determines lobe properties. It also opens up a new way of studying radio galaxy environments: though sensitive observations of optical galaxy clustering. With help from Zooites, we hope to expand this work to a much larger Radio Galaxy Zoo sample, which would allow us to probe the finer aspects of jet – environment interaction. Further afield, the ongoing GAMA Legacy ATCA Southern Survey (GLASS) project on the Australia Telescope Compact Array, as well as the Australian Square Kilometre Array Pathfinder EMU survey, will use this method to study the physics of black hole jets and the impact they have on their surroundings in a younger Universe.
On the 31 October 2018, Radio Galaxy Zoo published its first end-to-end machine learning system for “Classifying Radio sources Automatically using Neural networks” (ClaRAN). This paper is led by ClaRAN’s developer, Chen Wu, a data scientist at the International Centre for Radio Astronomy Research at the University of Western Australia (ICRAR/UWA), who repurposed the FAST-rCNN algorithm (used by Microsoft and Facebook) to classify radio galaxies. ClaRAN was trained on radio galaxies classified by Radio Galaxy Zoo and so recognises some of the most common radio morphologies that have been classified.
The purpose of ClaRAN is to reduce the number of radio sources that require human visual classification so that future Radio Galaxy Zoo projects will have fewer “boring” sources, thereby increasing the chances of real discoveries by citizen scientists. ClaRAN (and its future cousins) are crucial for future surveys such as the EMU survey which is expected to detect ~70 million radio sources (using the Australian Square Kilometre Array Pathfinder telescope). While Radio Galaxy Zoo has made visual source classifications much more efficient, we will still need to reduce the total survey sample size to a sample for visual inspection that is less than 1% of the 70 million sources.
How does ClaRAN work? ClaRAN inspects both the radio and coordinate-matched infrared overlay in the same fashion as RGZ Zooites, and then determines the radio source component associations in a similar fashion to the RGZ Data Release 1 (DR1) catalogue. As ClaRAN is still in its prototype stage (–analogous to the capabilities of a toddler), it only understands 3 main classes of radio morphologies — sources which have 1-, 2- or 3- separate radio components. ClaRAN was trained to understand these three different radio morphologies through seeing examples of all three classes from the RGZ DR1 catalogue. The animated gif (from the ICRAR press release) describes how ClaRAN “sees” the example radio galaxy. Please do not click on the link to the animated gif if you suffer from epilepsy or have any issues with flashing images.
As we look towards the future, we look forward to teaching ClaRAN some of
the more complex and exotic radio galaxy structures. For that to happen, we need to assemble much larger samples of more complex radio morphology classifications. With your support of Radio Galaxy Zoo, I am sure that we will get there.
Fun fact: did you know that some of the more obscure bugs in the RGZ DR1 catalogue processing was actually found through training ClaRAN? This is because ClaRAN is a good learner and will learn all the small details that we didn’t initially notice. We only discovered these bugs through some of the funny answers that we got out of some of the early testing of ClaRAN.
Thank you very much again to all our Radio Galaxy Zooites for your support. More information on the ICRAR press release for ClaRAN can be found via this link: https://www.icrar.org/claran/
Just a quick post to say thank you for your contributions to Galaxy Zoo: 3D in the last couple of weeks. I’m delighted to say that the bar drawing task is now completed. We still have a lot of spirals to draw though, so if you are ready for a challenge come join us in drawing these beautiful structures. Remember we collect 15 answers per galaxy, and use clever algorithms to combine them into a really reliable answer – so do your best, but don’t get too worried if your hand slips slightly! 🙂