Galaxy Zoo + Galaxy Zoo: 3D
Hi! I’m Tom, and I’m a PhD student at the University of Nottingham, doing some research to try to understand how spiral galaxies have grown and changed over their lifetimes. I’m especially interested in looking at how the spiral arms have been affecting the galaxy as a whole. I’ve recently finished up a paper in MNRAS in which I’ve been demonstrating a couple of new methods using some Galaxy Zoo data.
Amelia has already written [ https://blog.galaxyzoo.org/2018/07/17/finding-bars-in-galaxy-zoo-3d/ ] about how she is using the MaNGA survey [ https://www.sdss.org/surveys/manga/ ] to try to understand what’s happening in bars, so I won’t go into too much detail about this fantastic survey. I’ll just say that it’s part of the Sloan Digital Sky Survey, and for each of its sample of 10,000 galaxies, we have measurements of the spectrum at every position across the face of the galaxy.
MaNGA is really useful for trying to understand how galaxies have grown to their current size, because it is possible to get some sort of estimation of what kinds of stars are present in different locations of the galaxy. It’s a difficult thing to measure, so we can’t say exactly how many of every different type of star is present, but we can at least get a broad picture of the kinds of stellar ages and chemical enrichment (“metallicity”) in the stars. Astronomers have used these kinds of tools to measure the average age or metallicity of stars in different parts of galaxies, and found that in most spirals, the further out you go in the galaxy, the younger the stars are on average. The usual interpretation of this is that bulges tend to have formed first, and the disks have grown in size over time afterwards.

A MaNGA spiral galaxy. We can obtain information about the kinds of stars residing across the hexagonal area, which helps us understand how they’ve grown and evolved.
I’m really interested in trying to push this picture in two ways. Firstly, I’ve been trying to see what we can learn from looking at the general distribution of stars of different ages and metallicities – not just the average properties – at each location in the galaxy. Secondly, I think there is a lot of information that we risk ignoring by only looking at how things change with galactic radius. Spiral arms and the bar aren’t evenly distributed around the galaxy, so if we can see how the stellar properties change as we move around the galaxy, we should be able to measure what effect the spiral arms and bars have on the stars. The goal would be to try to confirm whether the most popular models of the nature of spiral arms and bars are correct or not.
To properly do this, we need to know exactly where the spiral arms and bars are in the MaNGA galaxies, so that we can see how the stars vary in these different regions. Enter Galaxy Zoo: 3D, where volunteers are asked to tell us where the different components are.

An example galaxy in MaNGA, where we’ve managed to split the galaxy into different stellar populations of different ages. Each frame shows where we find stars of a given age in this galaxy, starting from the oldest stars and finishing with the most recently formed stars. The colour denotes the mean metallicity of the stars, shown by the scale at the bottom.
All of this is what my most recent publication is about (read it in full at https://doi.org/10.1093/mnras/stz2204); we’ve shown that by combining the full spatial information available from MaNGA (augmented by Galaxy Zoo:3D) with the full distributions of the ages and metallicities of stars in each location, we can start to see some interesting things in the bar and spiral arms. It’s definitely best illustrated by an animation.
By splitting the age distributions up into different “time-slices”, we can create images of where stars of different ages are located in each of our MaNGA galaxies. Immediately from this one example, it’s obvious that there’s a lot of things going on here.
There are a few features in the animation that we’re not entirely convinced are real, but the main exciting things are that the spiral arms only show up in the youngest stars, and the bar grows and rotates as we move from older to younger stars. The growth of the bar is intriguing; this might be showing us how it formed. The bar changing with angle is even more exciting, and we think it shows us how quickly new-born stars become mixed and “locked” into the bar. The arms show what we should expect; spiral arms are areas of intense star formation, but over time the stars formed there will become mixed around the disk. We measured this effect by looking at what fraction of stars of each age are located in the volunteer-drawn spiral arms from Galaxy Zoo:3D.
This is really interesting, and highlights the power of combining large surveys like MaNGA with crowd-sourced information from the Zooniverse.
The next step is to do these kinds of things with more than just this one galaxy though. I’ve started looking at how these techniques can measure how fast the disks of spiral galaxies grew, using a large sample of spiral galaxies identified by Galaxy Zoo 2 volunteers. I’m also trying to measure how quickly stars get mixed away from spiral arms in different types of spiral galaxies. I have started to find some hints of some exciting results on both of these topics, which I would love to share in a future blog post if you’re interested.

We need volunteers to tell us where the spiral arms and bars are in galaxies, so that we can start to see what makes these regions special.
However, I’m currently limited in the number of galaxies with spiral arm regions identified by Galaxy Zoo:3D volunteers, so it would be really helpful if we could get some more! Understanding what makes spiral structure appear in disky galaxies is one of the unsolved problems in galaxy evolution and formation, and the clues to finding out might well lie in measuring how spiral arms affect the galaxy’s stars. Galaxy Zoo:3D will definitely be able to play a role in this! Help us out at https://www.zooniverse.org/projects/klmasters/galaxy-zoo-3d.
Galaxy Zoo Mobile
Hi, I’m Lauren, a summer work experience student working with the Galaxy Zoo team at the University of Oxford for a couple of weeks, and it’s my pleasure to be able to bring you some fantastic news. Today, we’re launching the mobile version of Galaxy Zoo! Unlike the website version, this brand-new native mobile version has questions with only two possible answers – just swipe left or right depending on your answer! This can create a more captivating and faster-paced experience when you are classifying galaxies.
Not only does this introduce a new and engaging platform for the project, but it also means that you can classify galaxies anywhere – on the bus, at the beach, at a concert, in the waiting room at the dentist etc. Hopefully, this will mean many more galaxy classifications whilst also providing easier access for our wide range of volunteers across the world. By introducing this app, we hope to inspire others to join our Galaxy Zoo team, no matter their qualifications or skill set.
Get involved by downloading the Zooniverse app (if you don’t have it already), heading over to ‘Space’ section, and selecting the ‘Galaxy Zoo Mobile’ project. From there, you will be greeted with three different workflows – ‘Smooth or Featured’, ‘Spiral Arms’ or ‘Merging/Disturbed’. Pick whichever you like! The simple, swiping interface allows you to classify galaxies much faster than ever before, meaning the Galaxy Zoo science team can produce results even quicker. So, download the Zooniverse app today and start classifying!
Apple App Store: https://apps.apple.com/us/app/zooniverse/id1194130243
Google Play Store: https://play.google.com/store/apps/details?id=com.zooniversemobile&hl=en
Happy classifying,
Lauren & the Galaxy Zoo Team
Machine Learning Messaging Experiment
Alongside the new workflow that Galaxy Zoo has just launched (read more in this blog post: https://wp.me/p2mbJY-2tJ), we’re taking the opportunity to work once again with researchers from Ben Gurion University and Microsoft Research to run an experiment which looks at how we can communicate with volunteers. As part of this experiment volunteers classifying galaxies on the new workflow may see short messages about the new machine learning elements. Anyone seeing these messages will be given the option to withdraw from the experiment’; just select the ‘opt out’ button to avoid seeing any further messages.
After the experiment is finished we will publish a debrief blog here describing more of the details and presenting our results.
This messaging experiment has ethics approval from Ben Gurion University (reference: SISE-2019-01) and the University of Oxford (reference: R63818/RE001).
Galaxy Zoo Upgrade: Better Galaxies, Better Science
Since I joined the team in 2018, citizen scientists like you have given us over 2 million classifications for 50,000 galaxies. We rely on these classifications for our research: from spiral arm winding, to merging galaxies, to star formation – and that’s just in the last month!
We want to get as much science as possible out of every single click. Your time is valuable and we have an almost unlimited pile of galaxies to classify. To do this, we’ve spent the past year designing a system to prioritise which galaxies you see on the site – which you can choose to access via the ‘Enhanced’ workflow.
This workflow depends on a new automated galaxy classifier using machine learning – an AI, if you like. Our AI is good at classifying boring, easy galaxies very fast. You are a much better classifier, able to make sense of the most difficult galaxies and even make new discoveries like Voorwerpen, but unfortunately need to eat and sleep and so on. Our idea is to have you and the AI work together.
The AI can guess which challenging galaxies, if classified by you, would best help it to learn. Each morning, we upload around 100 of these extra-helpful galaxies. The next day, we collect the classifications and use them to teach our AI. Thanks to your classifications, our AI should improve over time. We also upload thousands of random galaxies and show each to 3 humans, to check our AI is working and to keep an eye out for anything exciting.
With this approach, we combine human skill with AI speed to classify far more galaxies and do better science. For each new survey:
- 40 humans classify the most challenging and helpful galaxies
- Each galaxy is seen by 3 humans
- The AI learns to predict well on all the simple galaxies not yet classified
What does this mean in practice? Those choosing the ‘Enhanced’ workflow will see somewhat fewer simple galaxies (like the ones on the right), and somewhat more galaxies which are diverse, interesting and unusual (like the ones on the left). You will still see both interesting and simple galaxies, and still see every galaxy if you make enough classifications.
With our new system, you’ll see somewhat more galaxies like the ones on the left, and somewhat fewer like the ones on the right.
We would love for you to join in with our upgrade, because it helps us do more science. But if you like Galaxy Zoo just the way it is, no problem – we’ve made a copy (the ‘Classic’ workflow) that still shows random galaxies, just as we always have. If you’d like to know more, check out this post for more detail or read our paper. Separately, we’re also experimenting with sending short messages – check out this post to learn more.
Myself and the Galaxy Zoo team are really excited to see what you’ll discover. Let’s get started.
Scaling Galaxy Zoo with Bayesian Neural Networks
This is a technical overview of our recent paper (Walmsley 2019) aimed at astronomers. If you’d like an introduction to how machine learning improves Galaxy Zoo, check out this blog.
I’d love to be able to take every galaxy and say something about it’s morphology. The more galaxies we label, the more specific questions we can answer. When you want to know what fraction of low-mass barred spiral galaxies host AGN, suddenly it really matters that you have a lot of labelled galaxies to divide up.
But there’s a problem: humans don’t scale. Surveys keep getting bigger, but we will always have the same number of volunteers (applying order-of-magnitude astronomer math).
We’re struggling to keep pace now. When EUCLID (2022), LSST (2023) and WFIRST (2025ish) come online, we’ll start to look silly.

To keep up, Galaxy Zoo needs an automatic classifier. Other researchers have used responses that we’ve already collected from volunteers to train classifiers. The best performing of these are convolutional neural networks (CNNs) – a type of deep learning model tailored for image recognition. But CNNs have a drawback. They don’t easily handle uncertainty.
When learning, they implicitly assume that all labels are equally confident – which is definitely not the case for Galaxy Zoo (more in the section below). And when making (regression) predictions, they only give a ‘best guess’ answer with no error bars.
In our paper, we use Bayesian CNNs for morphology classification. Our Bayesian CNNs provide two key improvements:
- They account for varying uncertainty when learning from volunteer responses
- They predict full posteriors over the morphology of each galaxy
Using our Bayesian CNN, we can learn from noisy labels and make reliable predictions (with error bars) for hundreds of millions of galaxies.
How Bayesian Convolutional Neural Networks Work
There’s two key steps to creating Bayesian CNNs.
1. Predict the parameters of a probability distribution, not the label itself
Training neural networks is much like any other fitting problem: you tweak the model to match the observations. If all the labels are equally uncertain, you can just minimise the difference between your predictions and the observed values. But for Galaxy Zoo, many labels are more confident than others. If I observe that, for some galaxy, 30% of volunteers say “barred”, my confidence in that 30% massively depends on how many people replied – was it 4 or 40?
Instead, we predict the probability that a typical volunteer will say “Bar”, and minimise how surprised we should be given the total number of volunteers who replied. This way, our model understands that errors on galaxies where many volunteers replied are worse than errors on galaxies where few volunteers replied – letting it learn from every galaxy.
2. Use Dropout to Pretend to Train Many Networks
Our model now makes probabilistic predictions. But what if we had trained a different model? It would make slightly different probabilistic predictions. We need to marginalise over the possible models we might have trained. To do this, we use dropout. Dropout turns off many random neurons in our model, permuting our network into a new one each time we make predictions.
Below, you can see our Bayesian CNN in action. Each row is a galaxy (shown to the left). In the central column, our CNN makes a single probabilistic prediction (the probability that a typical volunteer would say “Bar”). We can interpret that as a posterior for the probability that k of N volunteers would say “Bar” – shown in black. On the right, we marginalise over many CNN using dropout. Each CNN posterior (grey) is different, but we can marginalise over them to get the posterior over many CNN (green) – our Bayesian prediction.

Read more about it in the paper.
Active Learning
Modern surveys will image hundreds of millions of galaxies – more than we can show to volunteers. Given that, which galaxies should we classify with volunteers, and which by our Bayesian CNN?
Ideally we would only show volunteers the images that the model would find most informative. The model should be able to ask – hey, these galaxies would be really helpful to learn from– can you label them for me please? Then the humans would label them and the model would retrain. This is active learning.
In our experiments, applying active learning reduces the number of galaxies needed to reach a given performance level by up to 35-60% (See the paper).
We can use our posteriors to work out which galaxies are most informative. Remember that we use dropout to approximate training many models (see above). We show in the paper that informative galaxies are galaxies where those models confidently disagree.

This is only possible because we think about labels probabilistically and approximate training many models.
What galaxies are informative? Exactly the galaxies you would intuitively expect.
- The model strongly prefers diverse featured galaxies over ellipticals
- For identifying bars, the model prefers galaxies which are better resolved (lower redshift)
This selection is completely automatic. Indeed, I didn’t realise the lower redshift preference until I looked at the images!
I’m excited to see what science can be done as we move from morphology catalogs of hundreds of thousands of galaxies to hundreds of millions. If you’d like to know more or you have any questions, get in touch in the comments or on Twitter (@mike_w_ai, @chrislintott, @yaringal).
Cheers,
Mike
Bayesian View of Galaxy Evolution

The Universe is pretty huge, and to understand it we need to collect vast amounts of data. The Hubble Telescope is just one of many telescopes collecting data from the Universe. Hubble alone produces 17.5 GB of raw science data each week. That means since its launch to low earth orbit in April 1990, it’s collected roughly a block of data equivalent in size to 6 million mp3 songs! With the launch of NASA’s James Webb Telescope just around the corner – (a tennis court sized space telescope!), the amount of raw data we can collect from the Universe is going to escalate dramatically. In order to decipher what this data is telling us about the Universe we need to use sophisticated statistical techniques. In this post I want to talk a bit about a particular technique I’ve been using called a Markov-Chain-Monte-Carlo (MCMC) simulation to learn about galaxy evolution.
Before we dive in into the statistics let me try and explain what I’m trying to figure out. We can model galaxy evolution by looking at a galaxy’s star formation rate (SFR) over time. Basically we want know to how fast a particular galaxy is making stars at any given time. Typically, a galaxy has an initial constant high SFR then at a time called t quench (tq) it’s SFR decreases exponentially which is characterised by a number called tau. Small tau means the galaxy stops forming stars, or is quenched, more rapidly. So overall for each galaxy we need to determine two numbers tq and tau to figure out how it evolved. Figure 1 shows what this model looks like.
Figure 1: Model of a single galaxy’s SFR over time. Showing an initial high constant SFR, follow by a exponential quench at tq.
To calculate these two numbers, tq and tau, we look at the colour of the galaxy, specifically the UVJ colour I mentioned in my last post. We then compare this to a predicted colour of a galaxy for a specific value of tq and tau. The problem is that there are many different combinations of tq and tau, how to we find the best match for a galaxy? We use a MCMC simulation to do this.
The first MC – Markov-Chain – just means an efficient random walk. We send “walkers” to have a look around for a good tq and tau, but the direction we send them to walk at each step depends on how good the tq and tau they are currently at is. The upshot of this is we quickly home in on a good value of tq and tau. The second MC – Monte Carlo – just picks out random values of tq and tau and tests how good they are by comparing the UVJ colours and our SFR model. Figure 2 shows a gif of a MCMC simulation of a single galaxy. The histograms shows the positions of the walkers searching the tq and tau space, and the blue crosshair shows the best fit value of tq and tau at every step. You can see the walkers homing in and settling down on the best value of tq and tau. I ran this simulation by running a modified version of the starpy code.
Figure 2: MCMC simulation for a single galaxy, pictured in the top right corner. Main plot shows density of walkers. Marginal histograms show 1D projections of walker densities. Blue crosshair shows best fit values of tq and tau at each step.
The maths that underpins this simulation is called Bayesian Statistics, and it’s quite a novel way of thinking about parameters and data. The main difference is that instead of treating unknown parameters as fixed quantities with associated error, they are treated as random variables described by probability distributions. It’s quite a powerful way of looking at the Universe! I’ve left all of the gory maths detail about MCMC out but if you’re interested an article by a DPhil student here at Oxford does are really good job of explaining it here.
So how does this all relate to galaxy morphology, and Galaxy Zoo classifications? I’m currently running the MCMC simulation showing in Figure 2 over the all the galaxies in the COSMOS survey. This is really cool because apart from getting to play with the University of Oxford’s super computer (544 cores!), I can use galaxy zoo morphology to see if the SFR of a galaxy over time is dependent on the galaxy’s shape, and overall learn what the vast amount of data I have says about galaxy evolution.
New paper on active black holes affecting star formation rates!
Good news everyone, another Galaxy Zoo paper was published today! This work was led by yours truly (Hi!) and looks at the impact that the central active black holes (active galactic nuclei; AGN) can have on the shape and star formation of their galaxy. It’s available here on astro-ph: http://arxiv.org/abs/1609.00023 and will soon be published in MNRAS.
Turns out, despite the fact that these supermassive black holes are TINY in comparison to their galaxy (300 light years across as opposed to 100,000 light years!) we see that within a population of these AGN galaxies the star formation rates have been recently and rapidly decreased. In a control sample of galaxies that don’t currently have an AGN in their centre, we don’t see the same thing happening. This phenomenon has been seen before in individual galaxies and predicted by simulations but this is the first time its been statistically shown to be happening within a large population. It’s tempting to say then that it’s the AGN that is directly causing this drop in the star formation rate (maybe because the energy thrown out by the active black hole blasts out or heats the gas needed to fuel star formation) but with the data we have we can’t say for definite if the AGN are the cause. It could be that this drop in star formation is being caused by another means entirely, which also coincidentally turns on an AGN in a galaxy.

A random sample of galaxies which host a central active black hole used in this work. The disc vote fraction classification from Galaxy Zoo 2 is shown for each image. Images from SDSS.
These galaxies were also all classified by our wonderful volunteers in Galaxy Zoo 2 which meant that we could also look whether this drop in the star formation rate was dependent on the morphology of the galaxy; turns out not so much! If the drop in the star formation rate is being caused directly by the AGN (and remember we still can’t say for sure!) then the central black hole of a galaxy doesn’t care what shape galaxy it’s in. An AGN will affect all galaxies, regardless of morphology, just the same.