Galaxy Zoo Upgrade: Better Galaxies, Better Science

Since I joined the team in 2018, citizen scientists like you have given us over 2 million classifications for 50,000 galaxies. We rely on these classifications for our research: from spiral arm winding, to merging galaxies, to star formation – and that’s just in the last month!

We want to get as much science as possible out of every single click. Your time is valuable and we have an almost unlimited pile of galaxies to classify. To do this, we’ve spent the past year designing a system to prioritise which galaxies you see on the site – which you can choose to access via the ‘Enhanced’ workflow.

This workflow depends on a new automated galaxy classifier using machine learning – an AI, if you like. Our AI is good at classifying boring, easy galaxies very fast. You are a much better classifier, able to make sense of the most difficult galaxies and even make new discoveries like Voorwerpen, but unfortunately need to eat and sleep and so on. Our idea is to have you and the AI work together.

The AI can guess which challenging galaxies, if classified by you, would best help it to learn. Each morning, we upload around 100 of these extra-helpful galaxies. The next day, we collect the classifications and use them to teach our AI. Thanks to your classifications, our AI should improve over time. We also upload thousands of random galaxies and show each to 3 humans, to check our AI is working and to keep an eye out for anything exciting.

With this approach, we combine human skill with AI speed to classify far more galaxies and do better science. For each new survey:

  • 40 humans classify the most challenging and helpful galaxies
  • Each galaxy is seen by 3 humans
  • The AI learns to predict well on all the simple galaxies not yet classified

What does this mean in practice? Those choosing the ‘Enhanced’ workflow will see somewhat fewer simple galaxies (like the ones on the right), and somewhat more galaxies which are diverse, interesting and unusual (like the ones on the left). You will still see both interesting and simple galaxies, and still see every galaxy if you make enough classifications.

With our new system, you’ll see somewhat more galaxies like the ones on the left, and somewhat fewer like the ones on the right.

We would love for you to join in with our upgrade, because it helps us do more science. But if you like Galaxy Zoo just the way it is, no problem – we’ve made a copy (the ‘Classic’ workflow) that still shows random galaxies, just as we always have. If you’d like to know more, check out this post for more detail or read our paper. Separately, we’re also experimenting with sending short messages – check out this post to learn more.

Myself and the Galaxy Zoo team are really excited to see what you’ll discover. Let’s get started.

Scaling Galaxy Zoo with Bayesian Neural Networks

This is a technical overview of our recent paper (Walmsley 2019) aimed at astronomers. If you’d like an introduction to how machine learning improves Galaxy Zoo, check out this blog.

I’d love to be able to take every galaxy and say something about it’s morphology. The more galaxies we label, the more specific questions we can answer. When you want to know what fraction of low-mass barred spiral galaxies host AGN, suddenly it really matters that you have a lot of labelled galaxies to divide up.

But there’s a problem: humans don’t scale. Surveys keep getting bigger, but we will always have the same number of volunteers (applying order-of-magnitude astronomer math).

We’re struggling to keep pace now. When EUCLID (2022), LSST (2023) and WFIRST (2025ish) come online, we’ll start to look silly.

Galaxies/day required to keep pace with upcoming surveys now, by 2019 year-end, and by 2022 year-end. Estimates from internal science plan.

To keep up, Galaxy Zoo needs an automatic classifier. Other researchers have used responses that we’ve already collected from volunteers to train classifiers. The best performing of these are convolutional neural networks (CNNs) – a type of deep learning model tailored for image recognition. But CNNs have a drawback. They don’t easily handle uncertainty.

When learning, they implicitly assume that all labels are equally confident – which is definitely not the case for Galaxy Zoo (more in the section below). And when making (regression) predictions, they only give a ‘best guess’ answer with no error bars.

In our paper, we use Bayesian CNNs for morphology classification. Our Bayesian CNNs provide two key improvements:

  1. They account for varying uncertainty when learning from volunteer responses
  2. They predict full posteriors over the morphology of each galaxy

Using our Bayesian CNN, we can learn from noisy labels and make reliable predictions (with error bars) for hundreds of millions of galaxies.

How Bayesian Convolutional Neural Networks Work

There’s two key steps to creating Bayesian CNNs.

1. Predict the parameters of a probability distribution, not the label itself

Training neural networks is much like any other fitting problem: you tweak the model to match the observations. If all the labels are equally uncertain, you can just minimise the difference between your predictions and the observed values. But for Galaxy Zoo, many labels are more confident than others. If I observe that, for some galaxy, 30% of volunteers say “barred”, my confidence in that 30% massively depends on how many people replied – was it 4 or 40?

Instead, we predict the probability that a typical volunteer will say “Bar”, and minimise how surprised we should be given the total number of volunteers who replied. This way, our model understands that errors on galaxies where many volunteers replied are worse than errors on galaxies where few volunteers replied – letting it learn from every galaxy.

2. Use Dropout to Pretend to Train Many Networks

Our model now makes probabilistic predictions. But what if we had trained a different model? It would make slightly different probabilistic predictions. We need to marginalise over the possible models we might have trained. To do this, we use dropout. Dropout turns off many random neurons in our model, permuting our network into a new one each time we make predictions.

Below, you can see our Bayesian CNN in action. Each row is a galaxy (shown to the left). In the central column, our CNN makes a single probabilistic prediction (the probability that a typical volunteer would say “Bar”). We can interpret that as a posterior for the probability that k of N volunteers would say “Bar” – shown in black. On the right, we marginalise over many CNN using dropout. Each CNN posterior (grey) is different, but we can marginalise over them to get the posterior over many CNN (green) – our Bayesian prediction.

Read more about it in the paper.

Active Learning

Modern surveys will image hundreds of millions of galaxies – more than we can show to volunteers. Given that, which galaxies should we classify with volunteers, and which by our Bayesian CNN?

Ideally we would only show volunteers the images that the model would find most informative. The model should be able to ask – hey, these galaxies would be really helpful to learn from– can you label them for me please? Then the humans would label them and the model would retrain. This is active learning.

In our experiments, applying active learning reduces the number of galaxies needed to reach a given performance level by up to 35-60% (See the paper).

We can use our posteriors to work out which galaxies are most informative. Remember that we use dropout to approximate training many models (see above). We show in the paper that informative galaxies are galaxies where those models confidently disagree.

Informative galaxies are galaxies where the each model is confident (entropy H in the posterior from each model is low) but the average prediction over all the models is uncertain (entropy across all averaged posteriors is high). See the paper for more.

This is only possible because we think about labels probabilistically and approximate training many models.

What galaxies are informative? Exactly the galaxies you would intuitively expect.

  • The model strongly prefers diverse featured galaxies over ellipticals
  • For identifying bars, the model prefers galaxies which are better resolved (lower redshift)

This selection is completely automatic. Indeed, I didn’t realise the lower redshift preference until I looked at the images!

I’m excited to see what science can be done as we move from morphology catalogs of hundreds of thousands of galaxies to hundreds of millions. If you’d like to know more or you have any questions, get in touch in the comments or on Twitter (@mike_w_ai, @chrislintott, @yaringal).

Cheers,
Mike

Classify Now

Excited to join in? Click here to go to Galaxy Zoo and start classifying! What could you discover?

Thanks for the millions!

Congratulations Radio Galaxy Zoo citizen scientists on a job well done! The Radio Galaxy Zoo 1 project has now finished with ~2.29 million classifications! Well done on helping us push towards the finish line.

We have at least two second-generation Radio Galaxy Zoo projects in the pipeline for which we hope to launch next. Therefore please stay tuned for the announcement of the Radio  Galaxy Zoo 2 projects where we will be presenting you with new data from the next-generation radio telescopes.

Thank you very much again for all your support and we will continue to keep you updated on our progress in the interim.

Cheers,
Ivy & Stas

Winding Problems

I’m delighted to announce the acceptance of another paper based on your classifications at Galaxy Zoo, “Galaxy Zoo: Unwinding the Winding Problem – Observations of Spiral Bulge Prominence and Arm Pitch Angles Suggest Local Spiral Galaxies are Winding”, which has just been released on the arxiv pre-print server, and appear in the Monthly Notices of the Royal Astronomical Society (MNRAS) soon.

Here’s the title and author page.

Screen Shot 2019-04-25 at 14.39.54

This paper has been a long time coming, and is based significantly on the excellent thesis work of Ross Hart (PhD from Nottingham University). Ross wrote about some of his work for the blog previously “How Do Spiral Arms Affect Star Formation“. One of the things Ross’s PhD work showed was just how good your identification of spiral arm winding is, and that allowed us to be confident to use it in this paper.

You might notice the appearance of some of your fellow citizen scientists in this author list. Dennis, Jean and Satoshi provided help via the “Galaxy Zoo Literature Search” call which ended up contributing significantly to the paper.

Our main result is that we do not find any significant correlation between how large the bulges are and how tightly wound the spirals are in Galaxy Zoo spiral galaxies…. this non-detection was a big surprise, because this correlation is discussed in basically all astronomy text books – it forms the basis of the spiral sequence described by Hubble.

Screen Shot 2019-04-25 at 15.01.26

The Hubble Tuning Fork illustrated with SDSS images of nearby galaxies.

Way back in 1927 Hubble wrote (about the spiral nebula he had observed) that: “three [properties] determine positions in the sequence: (1) the relative size of the unresolved nuclear region, (2) the extent to which the arms are unwound (the openness or angle of the spiral), (3) the degree of condensation in the arms.” He goes on to explain that “These three criteria are quite independent, but as an empirical fact of observation they develop in the same direction, and can be treated as various aspects of the same process.” (i.e. Hubble observed them to be correlated).

It’s been known for a long time that there are examples where bulge (or “unresolved nuclear region”) size and arm winding did not agree, but these are usually treated as exceptions. What we’ve shown in this paper, is that for a sample selection which goes beyond just the brightest nearby galaxies Hubble could see, the correlation is not strong at all. Below is an annotated version of our main result figure – each point is a spiral with Galaxy Zoo classifications, and the contours show where there are lots of points. We find spirals all over this plot (except not many with big bulges and loosely wound arms), and the red and blue lines show the lack of any strong trend in either direction.

Screen Shot 2019-04-25 at 15.15.13

Figure 5 from Masters et al. (2019) paper.

 

This has significantly implications for how we interpret spiral winding angles, and could be explained by many/most spiral arms winding up over time (at rates which depend on the bulge size) rather than being density waves. We need to do more work to really understand what this observation tells us (which is a great place to be in science!).

We have also known for a while, that bulge size correlates best with modern expert galaxy classification on the Hubble sequence (e.g. when we compared you classifications to the largest samples done in that way).  So another point we make in this paper is how different these modern classifications are to the traditional classifications done by Hubble and others. That’s OK – classifications should (and do) shift in science (part of the scientific method is to change on the basis of evidence), but it does mean care needs to be taken to be precise about what is meant by “morphology of galaxies”.

I ended the abstract of the paper with: “It is remarkable that after over 170 years of observations of spiral arms in galaxies our understanding of them remains incomplete.” and I really think that’s a good place to end. Galaxy morphology provides a rich source of data for understanding the physics of galaxies, and thanks to you we have access to the largest and most reliable set of galaxy morphologies ever. 


 

If you’re inspired to keep classifying, head over to the main Galaxy Zoo project, or why not draw a few spiral arms over at Galaxy Zoo: 3D where we’re trying to understand spiral arms in more detail.

 

Radio Galaxy Zoo final sprint !

Screen Shot 2015-10-24 at 11.11.24 PM

Radio Galaxy Zoo logo

Here is a bittersweet announcement that the current first-generation Radio Galaxy Zoo project will be retiring on the 1st May 2019. We are so grateful to have worked with such a productive team of citizen and professional scientists for the past 5.5 years.
To-date, we have made over 2.27 million classifications and published 10 refereed journal articles. We have another 1 submitted and another to be submitted in the next few weeks.

Looking towards the future, we are of course in the process of developing the next-generation of Radio Galaxy Zoo projects. For that, we ask that you stay tune for our future announcements of the suite of Radio Galaxy Zoo 2 projects that we are planning to launch.  Of course, we will be keeping you all informed about our latest RGZ-based follow-up observations (e.g. the Zoo Gems programme with the Hubble Space Telescope). Therefore, this is not the last message from us.

To cap-off this impending retirement, I propose that we make a final RGZ sprint to the finish in the remaining days April 2019 –that is, let’s all try to classify as many sources as we can in the next few weeks!

Thank you very much again and let’s all make a concerted push to the  finish line!

Cheers,
Ivy & Stas

Radio Galaxy Zoo studies cluster environment impact on radio galaxy morphologies

The following blogpost is from Avery Garon who led the publication of Radio Galaxy Zoo’s latest science result. Congratulations to Avery and team!

***************

Radio Galaxy Zoo is starting the new year strong, with another paper just accepted for publication. “Radio Galaxy Zoo: The Distortion of Radio Galaxies by Galaxy Clusters” will appear soon in The Astronomical Journal and is available now as a pre-print on the arXiv: https://arxiv.org/abs/1901.05480. This paper was led by University of Minnesota graduate student Avery Garon and investigates several ways in which the shape of a galaxy’s radio emission is affected by and informs us about the environment in which we find the galaxy.

Like the previous RGZ paper, we are looking for how the radio tails extend into the hot plasma that fills galaxy clusters (the intracluster medium, or ICM). This time, we measure how much the two tails deviate from a straight line, marked in the example below by the value θ. The standard model is that the ICM exerts ram pressure on the galaxy as it travels though the cluster and causes its tails to bend away from the direction of motion. However, while individual clusters have been studied in great detail, no one has had a large enough sample of radio galaxies to statistically validate this model. Thanks to RGZ, we were able to observe the effect of ram pressure as a trend for the bending angle θ to increase for galaxies closer to the center of clusters (where the ICM density is higher) and in higher mass clusters (where the galaxies orbit with higher speeds).

avery_blogfig

Example source RGZ J080641.4+494629. The magenta arrows extend from the host galaxy identified by RGZ users and terminate at the peaks of the radio emission, defining the bending angle θ. The cyan arrow is used to define an orientation for the galaxy with respect to the cluster.

Because ram pressure causes the tails to bend away from the direction in which the galaxy is travelling, we can use this knowledge to map out the kinds of orbits that these galaxies are on. Unlike planetary orbits, which are nearly circular and all in the same plane, the orbits of galaxies in clusters tend to be randomly distributed in orientation and eccentricity. Our sample of bent radio galaxies shows an even more striking result: they are preferentially found in highly radial orbits that plunge through the center of their clusters, which suggests that they are being bent as their orbits take them through the dense central regions.

Finally, we looked at radio galaxies that were far from clusters. Even though the median bending angle is 0° away from clusters, there is still a small fraction of highly bent galaxies out there. By counting the number of optical galaxies that are near the radio galaxies, we observed a sharp increase in the number of companions within a few hundred kiloparsecs of our bent radio galaxies. This suggests that even outside of true cluster environments, we are still observing bending induced by local overdensities in the intergalactic medium.

Galaxy Zoo wins the RAS Group Achievement Award

We’ve won a prize! The Royal Astronomical Society has given the Galaxy Zoo team – including the volunteers who have made the project the success it is – their Group Achievement Award for 2019. I will post the citation below, but mostly I’m delighted that this award recognises all those who have worked to make Galaxy Zoo a success.

Looking at the list of previous winners – the last two are the team behind ESA’s Planck satellite and the team who made the Nobel winning discovery of gravitational waves – is humbling, so this is really something to be proud of.

We’ll make plans to make sure everyone can celebrate the award when it’s presented at the National Astronomy Meeting later in the year.

Chris

Citation for the 2019 RAS Group Achievement Award (A)

The 2019 Group Achievement Award is awarded to the Galaxy Zoo team. With over ten years of engagement under their belt, the Galaxy Zoo team have contributed significantly to our knowledge of the formation and evolution of galaxies, through strong commitment to collaboration with members of the public. They have established citizen science as a standard mode of data analysis across astrophysics, and initiated new areas of research sparked by Galaxy Zoo discoveries. Their roughly 55 papers, ranging from studies disentangling morphology, environment and colour, through to studies of individual morphological characteristics, have been enabled by the team’s careful work to create catalogues and measure systematic effects inherent in the classification, before releasing the data to the community.

The Galaxy Zoo project has also inspired many similar projects across astrophysics and beyond, through the Zooniverse platform. Perhaps Galaxy Zoo’s most notable achievement is immensely effective outreach: the more than 500,000 people who have contributed to date come from a wide range of backgrounds, making participation in scientific research possible for all. Galaxy Zoo inspires and informs, and does so on an unprecedented scale. For these reasons, the Galaxy Zoo team is awarded the Group Achievement Award.

Happy 5th birthday Radio Galaxy Zoo!

Happy 5th birthday to Radio Galaxy Zoo!

We have now completed 84% of the project and reached 2.24 million classifications (the equivalent of ~90.2 years of work) thanks to all the hard work from our Radio Galaxy Zooites. So much has happened in the world of Radio Galaxy Zoo this year and many of the new scientific results we reported cannot have happened without your help.

In 2018, we had 4 papers accepted for publication in the Monthly Notices of the Royal Astronomical Society, doubling the number of papers that Radio Galaxy Zoo previously published. In addition, we have three more Radio Galaxy Zoo papers that have been submitted this year and are currently undergoing the refereeing process.

As always, our science papers can be freely-accessed and so I encourage you all to check out the following papers if you are interested. Here is the list of papers published this year:
1) Radio Galaxy Zoo: compact and extended radio source classification with deep learning by Vesna Lukic et al
2) Radio Galaxy Zoo: machine learning for radio source host galaxy cross-identification by Matthew Alger et al
3) Radio Galaxy Zoo: CLARAN – a deep learning classifier for radio morphologies
by Chen Wu et al
4) Radio Galaxy Zoo: observational evidence for environment as the cause of radio source asymmetry by Payton Rodman et al

As we summarise the main events this year, it would be remiss of me to not mention the retirement of our previous co-Primary Investigator (co-PI) as well as original driver of this project, Dr Julie Banfield, without whom Radio Galaxy Zoo wouldn’t be where it is today. We continue to be very grateful for her hard work and support. Finally, I would like to thank Dr Stas Shabala for agreeing to be a co-PI on this project after Julie’s departure for greener pastures.

Thank you all very much again for all your help and we shall continue to report on the science that is made possible thanks to you all.  Keep up the awesome work! We hope that you all have a happy end-of-2018 and an excellent 2019.

Cheers,
Ivy & Stas

Active galaxies illuminating their companions – Galaxy Zoo identifies cross-ionization

One of the most enduring serendipitous finds of the original Galaxy Zoo was a category of giant gas clouds shining from the energy input of active galactic nuclei (AGN) which have since faded (being a little cavalier here with time and verb tenses, since we can’t get news faster than light travels). The most famous of the is of course Hanny’s Voorwerp, whose discovery led to subprojects which turned up many more (“Voorwerpjes”). We have new results now on a related project going back to the Galaxy Zoo Forum, where we searched for gas in companions to active galaxies which is ionized by the AGN, and therefore gives us one more way to learn about how bright the AGN was tens of thousands of years before our direct view. Read More…