Hi! I’m Tom, and I’m a PhD student at the University of Nottingham, doing some research to try to understand how spiral galaxies have grown and changed over their lifetimes. I’m especially interested in looking at how the spiral arms have been affecting the galaxy as a whole. I’ve recently finished up a paper in MNRAS in which I’ve been demonstrating a couple of new methods using some Galaxy Zoo data.
Amelia has already written [ https://blog.galaxyzoo.org/2018/07/17/finding-bars-in-galaxy-zoo-3d/ ] about how she is using the MaNGA survey [ https://www.sdss.org/surveys/manga/ ] to try to understand what’s happening in bars, so I won’t go into too much detail about this fantastic survey. I’ll just say that it’s part of the Sloan Digital Sky Survey, and for each of its sample of 10,000 galaxies, we have measurements of the spectrum at every position across the face of the galaxy.
MaNGA is really useful for trying to understand how galaxies have grown to their current size, because it is possible to get some sort of estimation of what kinds of stars are present in different locations of the galaxy. It’s a difficult thing to measure, so we can’t say exactly how many of every different type of star is present, but we can at least get a broad picture of the kinds of stellar ages and chemical enrichment (“metallicity”) in the stars. Astronomers have used these kinds of tools to measure the average age or metallicity of stars in different parts of galaxies, and found that in most spirals, the further out you go in the galaxy, the younger the stars are on average. The usual interpretation of this is that bulges tend to have formed first, and the disks have grown in size over time afterwards.
I’m really interested in trying to push this picture in two ways. Firstly, I’ve been trying to see what we can learn from looking at the general distribution of stars of different ages and metallicities – not just the average properties – at each location in the galaxy. Secondly, I think there is a lot of information that we risk ignoring by only looking at how things change with galactic radius. Spiral arms and the bar aren’t evenly distributed around the galaxy, so if we can see how the stellar properties change as we move around the galaxy, we should be able to measure what effect the spiral arms and bars have on the stars. The goal would be to try to confirm whether the most popular models of the nature of spiral arms and bars are correct or not.
To properly do this, we need to know exactly where the spiral arms and bars are in the MaNGA galaxies, so that we can see how the stars vary in these different regions. Enter Galaxy Zoo: 3D, where volunteers are asked to tell us where the different components are.
All of this is what my most recent publication is about (read it in full at https://doi.org/10.1093/mnras/stz2204); we’ve shown that by combining the full spatial information available from MaNGA (augmented by Galaxy Zoo:3D) with the full distributions of the ages and metallicities of stars in each location, we can start to see some interesting things in the bar and spiral arms. It’s definitely best illustrated by an animation.
By splitting the age distributions up into different “time-slices”, we can create images of where stars of different ages are located in each of our MaNGA galaxies. Immediately from this one example, it’s obvious that there’s a lot of things going on here.
There are a few features in the animation that we’re not entirely convinced are real, but the main exciting things are that the spiral arms only show up in the youngest stars, and the bar grows and rotates as we move from older to younger stars. The growth of the bar is intriguing; this might be showing us how it formed. The bar changing with angle is even more exciting, and we think it shows us how quickly new-born stars become mixed and “locked” into the bar. The arms show what we should expect; spiral arms are areas of intense star formation, but over time the stars formed there will become mixed around the disk. We measured this effect by looking at what fraction of stars of each age are located in the volunteer-drawn spiral arms from Galaxy Zoo:3D.
This is really interesting, and highlights the power of combining large surveys like MaNGA with crowd-sourced information from the Zooniverse.
The next step is to do these kinds of things with more than just this one galaxy though. I’ve started looking at how these techniques can measure how fast the disks of spiral galaxies grew, using a large sample of spiral galaxies identified by Galaxy Zoo 2 volunteers. I’m also trying to measure how quickly stars get mixed away from spiral arms in different types of spiral galaxies. I have started to find some hints of some exciting results on both of these topics, which I would love to share in a future blog post if you’re interested.
However, I’m currently limited in the number of galaxies with spiral arm regions identified by Galaxy Zoo:3D volunteers, so it would be really helpful if we could get some more! Understanding what makes spiral structure appear in disky galaxies is one of the unsolved problems in galaxy evolution and formation, and the clues to finding out might well lie in measuring how spiral arms affect the galaxy’s stars. Galaxy Zoo:3D will definitely be able to play a role in this! Help us out at https://www.zooniverse.org/projects/klmasters/galaxy-zoo-3d.
Hi, I’m Lauren, a summer work experience student working with the Galaxy Zoo team at the University of Oxford for a couple of weeks, and it’s my pleasure to be able to bring you some fantastic news. Today, we’re launching the mobile version of Galaxy Zoo! Unlike the website version, this brand-new native mobile version has questions with only two possible answers – just swipe left or right depending on your answer! This can create a more captivating and faster-paced experience when you are classifying galaxies.
Not only does this introduce a new and engaging platform for the project, but it also means that you can classify galaxies anywhere – on the bus, at the beach, at a concert, in the waiting room at the dentist etc. Hopefully, this will mean many more galaxy classifications whilst also providing easier access for our wide range of volunteers across the world. By introducing this app, we hope to inspire others to join our Galaxy Zoo team, no matter their qualifications or skill set.
Get involved by downloading the Zooniverse app (if you don’t have it already), heading over to ‘Space’ section, and selecting the ‘Galaxy Zoo Mobile’ project. From there, you will be greeted with three different workflows – ‘Smooth or Featured’, ‘Spiral Arms’ or ‘Merging/Disturbed’. Pick whichever you like! The simple, swiping interface allows you to classify galaxies much faster than ever before, meaning the Galaxy Zoo science team can produce results even quicker. So, download the Zooniverse app today and start classifying!
Apple App Store: https://apps.apple.com/us/app/zooniverse/id1194130243
Google Play Store: https://play.google.com/store/apps/details?id=com.zooniversemobile&hl=en
Lauren & the Galaxy Zoo Team
Alongside the new workflow that Galaxy Zoo has just launched (read more in this blog post: https://wp.me/p2mbJY-2tJ), we’re taking the opportunity to work once again with researchers from Ben Gurion University and Microsoft Research to run an experiment which looks at how we can communicate with volunteers. As part of this experiment volunteers classifying galaxies on the new workflow may see short messages about the new machine learning elements. Anyone seeing these messages will be given the option to withdraw from the experiment’; just select the ‘opt out’ button to avoid seeing any further messages.
After the experiment is finished we will publish a debrief blog here describing more of the details and presenting our results.
This messaging experiment has ethics approval from Ben Gurion University (reference: SISE-2019-01) and the University of Oxford (reference: R63818/RE001).
Since I joined the team in 2018, citizen scientists like you have given us over 2 million classifications for 50,000 galaxies. We rely on these classifications for our research: from spiral arm winding, to merging galaxies, to star formation – and that’s just in the last month!
We want to get as much science as possible out of every single click. Your time is valuable and we have an almost unlimited pile of galaxies to classify. To do this, we’ve spent the past year designing a system to prioritise which galaxies you see on the site – which you can choose to access via the ‘Enhanced’ workflow.
This workflow depends on a new automated galaxy classifier using machine learning – an AI, if you like. Our AI is good at classifying boring, easy galaxies very fast. You are a much better classifier, able to make sense of the most difficult galaxies and even make new discoveries like Voorwerpen, but unfortunately need to eat and sleep and so on. Our idea is to have you and the AI work together.
The AI can guess which challenging galaxies, if classified by you, would best help it to learn. Each morning, we upload around 100 of these extra-helpful galaxies. The next day, we collect the classifications and use them to teach our AI. Thanks to your classifications, our AI should improve over time. We also upload thousands of random galaxies and show each to 3 humans, to check our AI is working and to keep an eye out for anything exciting.
With this approach, we combine human skill with AI speed to classify far more galaxies and do better science. For each new survey:
- 40 humans classify the most challenging and helpful galaxies
- Each galaxy is seen by 3 humans
- The AI learns to predict well on all the simple galaxies not yet classified
What does this mean in practice? Those choosing the ‘Enhanced’ workflow will see somewhat fewer simple galaxies (like the ones on the right), and somewhat more galaxies which are diverse, interesting and unusual (like the ones on the left). You will still see both interesting and simple galaxies, and still see every galaxy if you make enough classifications.
With our new system, you’ll see somewhat more galaxies like the ones on the left, and somewhat fewer like the ones on the right.
We would love for you to join in with our upgrade, because it helps us do more science. But if you like Galaxy Zoo just the way it is, no problem – we’ve made a copy (the ‘Classic’ workflow) that still shows random galaxies, just as we always have. If you’d like to know more, check out this post for more detail or read our paper. Separately, we’re also experimenting with sending short messages – check out this post to learn more.
Myself and the Galaxy Zoo team are really excited to see what you’ll discover. Let’s get started.
I’d love to be able to take every galaxy and say something about it’s morphology. The more galaxies we label, the more specific questions we can answer. When you want to know what fraction of low-mass barred spiral galaxies host AGN, suddenly it really matters that you have a lot of labelled galaxies to divide up.
But there’s a problem: humans don’t scale. Surveys keep getting bigger, but we will always have the same number of volunteers (applying order-of-magnitude astronomer math).
We’re struggling to keep pace now. When EUCLID (2022), LSST (2023) and WFIRST (2025ish) come online, we’ll start to look silly.
To keep up, Galaxy Zoo needs an automatic classifier. Other researchers have used responses that we’ve already collected from volunteers to train classifiers. The best performing of these are convolutional neural networks (CNNs) – a type of deep learning model tailored for image recognition. But CNNs have a drawback. They don’t easily handle uncertainty.
When learning, they implicitly assume that all labels are equally confident – which is definitely not the case for Galaxy Zoo (more in the section below). And when making (regression) predictions, they only give a ‘best guess’ answer with no error bars.
In our paper, we use Bayesian CNNs for morphology classification. Our Bayesian CNNs provide two key improvements:
- They account for varying uncertainty when learning from volunteer responses
- They predict full posteriors over the morphology of each galaxy
Using our Bayesian CNN, we can learn from noisy labels and make reliable predictions (with error bars) for hundreds of millions of galaxies.
How Bayesian Convolutional Neural Networks Work
There’s two key steps to creating Bayesian CNNs.
1. Predict the parameters of a probability distribution, not the label itself
Training neural networks is much like any other fitting problem: you tweak the model to match the observations. If all the labels are equally uncertain, you can just minimise the difference between your predictions and the observed values. But for Galaxy Zoo, many labels are more confident than others. If I observe that, for some galaxy, 30% of volunteers say “barred”, my confidence in that 30% massively depends on how many people replied – was it 4 or 40?
Instead, we predict the probability that a typical volunteer will say “Bar”, and minimise how surprised we should be given the total number of volunteers who replied. This way, our model understands that errors on galaxies where many volunteers replied are worse than errors on galaxies where few volunteers replied – letting it learn from every galaxy.
2. Use Dropout to Pretend to Train Many Networks
Our model now makes probabilistic predictions. But what if we had trained a different model? It would make slightly different probabilistic predictions. We need to marginalise over the possible models we might have trained. To do this, we use dropout. Dropout turns off many random neurons in our model, permuting our network into a new one each time we make predictions.
Below, you can see our Bayesian CNN in action. Each row is a galaxy (shown to the left). In the central column, our CNN makes a single probabilistic prediction (the probability that a typical volunteer would say “Bar”). We can interpret that as a posterior for the probability that k of N volunteers would say “Bar” – shown in black. On the right, we marginalise over many CNN using dropout. Each CNN posterior (grey) is different, but we can marginalise over them to get the posterior over many CNN (green) – our Bayesian prediction.
Read more about it in the paper.
Modern surveys will image hundreds of millions of galaxies – more than we can show to volunteers. Given that, which galaxies should we classify with volunteers, and which by our Bayesian CNN?
Ideally we would only show volunteers the images that the model would find most informative. The model should be able to ask – hey, these galaxies would be really helpful to learn from– can you label them for me please? Then the humans would label them and the model would retrain. This is active learning.
In our experiments, applying active learning reduces the number of galaxies needed to reach a given performance level by up to 35-60% (See the paper).
We can use our posteriors to work out which galaxies are most informative. Remember that we use dropout to approximate training many models (see above). We show in the paper that informative galaxies are galaxies where those models confidently disagree.
This is only possible because we think about labels probabilistically and approximate training many models.
What galaxies are informative? Exactly the galaxies you would intuitively expect.
- The model strongly prefers diverse featured galaxies over ellipticals
- For identifying bars, the model prefers galaxies which are better resolved (lower redshift)
This selection is completely automatic. Indeed, I didn’t realise the lower redshift preference until I looked at the images!
I’m excited to see what science can be done as we move from morphology catalogs of hundreds of thousands of galaxies to hundreds of millions. If you’d like to know more or you have any questions, get in touch in the comments or on Twitter (@mike_w_ai, @chrislintott, @yaringal).
Galaxy Zoo is celebrating ten years since launch next month, and as part of the festivities the science team are having a meeting in Oxford from 10th-12th July. Unfortunately we didn’t think it was feasible to invite the hundreds of thousands of you from all over the world who have contributed to the project over the last ten years, but the good news is that all of the talks from the meeting will be interactively live-streamed so that anyone can join in the discussion! See the schedule above for details on who is speaking at the meeting. Details of how to join the live stream will be released closer to the event.
There will also be an Oxford SciBar public event on the Monday night. All who are able to make it are welcome to join but don’t worry if you can’t, there will be a full podcast of the evening released shortly after the event!
The Universe is pretty huge, and to understand it we need to collect vast amounts of data. The Hubble Telescope is just one of many telescopes collecting data from the Universe. Hubble alone produces 17.5 GB of raw science data each week. That means since its launch to low earth orbit in April 1990, it’s collected roughly a block of data equivalent in size to 6 million mp3 songs! With the launch of NASA’s James Webb Telescope just around the corner – (a tennis court sized space telescope!), the amount of raw data we can collect from the Universe is going to escalate dramatically. In order to decipher what this data is telling us about the Universe we need to use sophisticated statistical techniques. In this post I want to talk a bit about a particular technique I’ve been using called a Markov-Chain-Monte-Carlo (MCMC) simulation to learn about galaxy evolution.
Before we dive in into the statistics let me try and explain what I’m trying to figure out. We can model galaxy evolution by looking at a galaxy’s star formation rate (SFR) over time. Basically we want know to how fast a particular galaxy is making stars at any given time. Typically, a galaxy has an initial constant high SFR then at a time called t quench (tq) it’s SFR decreases exponentially which is characterised by a number called tau. Small tau means the galaxy stops forming stars, or is quenched, more rapidly. So overall for each galaxy we need to determine two numbers tq and tau to figure out how it evolved. Figure 1 shows what this model looks like.
Figure 1: Model of a single galaxy’s SFR over time. Showing an initial high constant SFR, follow by a exponential quench at tq.
To calculate these two numbers, tq and tau, we look at the colour of the galaxy, specifically the UVJ colour I mentioned in my last post. We then compare this to a predicted colour of a galaxy for a specific value of tq and tau. The problem is that there are many different combinations of tq and tau, how to we find the best match for a galaxy? We use a MCMC simulation to do this.
The first MC – Markov-Chain – just means an efficient random walk. We send “walkers” to have a look around for a good tq and tau, but the direction we send them to walk at each step depends on how good the tq and tau they are currently at is. The upshot of this is we quickly home in on a good value of tq and tau. The second MC – Monte Carlo – just picks out random values of tq and tau and tests how good they are by comparing the UVJ colours and our SFR model. Figure 2 shows a gif of a MCMC simulation of a single galaxy. The histograms shows the positions of the walkers searching the tq and tau space, and the blue crosshair shows the best fit value of tq and tau at every step. You can see the walkers homing in and settling down on the best value of tq and tau. I ran this simulation by running a modified version of the starpy code.
Figure 2: MCMC simulation for a single galaxy, pictured in the top right corner. Main plot shows density of walkers. Marginal histograms show 1D projections of walker densities. Blue crosshair shows best fit values of tq and tau at each step.
The maths that underpins this simulation is called Bayesian Statistics, and it’s quite a novel way of thinking about parameters and data. The main difference is that instead of treating unknown parameters as fixed quantities with associated error, they are treated as random variables described by probability distributions. It’s quite a powerful way of looking at the Universe! I’ve left all of the gory maths detail about MCMC out but if you’re interested an article by a DPhil student here at Oxford does are really good job of explaining it here.
So how does this all relate to galaxy morphology, and Galaxy Zoo classifications? I’m currently running the MCMC simulation showing in Figure 2 over the all the galaxies in the COSMOS survey. This is really cool because apart from getting to play with the University of Oxford’s super computer (544 cores!), I can use galaxy zoo morphology to see if the SFR of a galaxy over time is dependent on the galaxy’s shape, and overall learn what the vast amount of data I have says about galaxy evolution.
Good news everyone, another Galaxy Zoo paper was published today! This work was led by yours truly (Hi!) and looks at the impact that the central active black holes (active galactic nuclei; AGN) can have on the shape and star formation of their galaxy. It’s available here on astro-ph: http://arxiv.org/abs/1609.00023 and will soon be published in MNRAS.
Turns out, despite the fact that these supermassive black holes are TINY in comparison to their galaxy (300 light years across as opposed to 100,000 light years!) we see that within a population of these AGN galaxies the star formation rates have been recently and rapidly decreased. In a control sample of galaxies that don’t currently have an AGN in their centre, we don’t see the same thing happening. This phenomenon has been seen before in individual galaxies and predicted by simulations but this is the first time its been statistically shown to be happening within a large population. It’s tempting to say then that it’s the AGN that is directly causing this drop in the star formation rate (maybe because the energy thrown out by the active black hole blasts out or heats the gas needed to fuel star formation) but with the data we have we can’t say for definite if the AGN are the cause. It could be that this drop in star formation is being caused by another means entirely, which also coincidentally turns on an AGN in a galaxy.
These galaxies were also all classified by our wonderful volunteers in Galaxy Zoo 2 which meant that we could also look whether this drop in the star formation rate was dependent on the morphology of the galaxy; turns out not so much! If the drop in the star formation rate is being caused directly by the AGN (and remember we still can’t say for sure!) then the central black hole of a galaxy doesn’t care what shape galaxy it’s in. An AGN will affect all galaxies, regardless of morphology, just the same.
Since our discovery in 2010 that the red spirals identified by your classifications in the first phase of Galaxy Zoo were twice as likely to host galactic scale bars as normal blue spirals, a lot of our research time has focused on understanding which types of galaxies host bars, and why that might be.
Our research with the bars identified by you in the second phase of Galaxy Zoo continues to gives us hints that these structures in galaxies might be involved in the process which quenches star formation in spiral galaxies and through that could be part of the process involved in the reduction of star formation in the universe as a whole.
We’ve also used your classifications as part of Galaxy Zoo Hubble and Galaxy Zoo CANDELS to identify the epoch in the universe when disc galaxies were first stable enough to host a significant number of bars, finding them possibly even earlier in the Universe than was previously thought.
Last Friday I spoke at the monthly “Ordinary Meeting” of the Royal Astronomical Society, giving summary of the evidence we’re collecting on the impact bars have on galaxies thanks to your classifications (a video of my talk will be available at some point). This was the second time I’ve spoken at this meeting about results from Galaxy Zoo, and it’s a delightful mix of professional colleagues, and enthusiastic amateurs – including some Galaxy Zoo volunteers.
Prompted by that I thought it was timely to write on this blog about what these bars really are, what they do to galaxies, and why I think they’re so interesting. I wrote the below some time ago when I had a spare few minutes, and was just looking for the right time to post it.
The thing about galaxies, which is sometimes hard to remember, is that they are simply vast collections of stars, and that those stars are all constantly in motion, orbiting their common centre of mass. The structures that we see in galaxies are just a snapshot of the locations of those stars right now (on a cosmic timescale), and the patterns we see in the positions of the stars reveals patterns in their orbital motions. A stellar bar for example reveals a set of very elongated orbits of stars in the disc of a galaxy.
Another extraordinary thing about a disc galaxy is how thin it is. To put this is perspective I’ll give you a real world example. In the Haus der Astronomie in Heidelberg you can walk around inside a scale model of the Whirlpool galaxy. The whole building was laid out in a design which reflects the spiral arms of this galaxy. However it’s not an exact scale model – to properly represent the thickness of the disc of the Whirlpool galaxy the building (which in actual fact has 3 stories and hosts a fairly large planetarium in its centre) would have to be only 90cm tall…..
Such an incredibly thin disc of stars floating independently in space would be quite unstable dynamically (meaning its own gravity should cause it to buckle and collapse on itself). This instability would immediately manifest in elongated orbits of stars, which would make a stellar bar (as part of this process of collapse). Simple computer models of disks of stars immediately form bars. Of course we now know that galaxy discs are submerged in massive halos of dark matter. So my first favourite little fact about bars is
(1) the fact that not all disc galaxies have bars was put forward as evidence that the discs must be embedded in massive halos before the existence of dark matter was widely accepted.
Now we can model dark matter halos better we discover that even with a dark matter halo, as long as that halo can absorb angular momentum (ie. rotate a bit) all discs will eventually make a bar. So my second favourite little fact is that
(2) we still don’t understand why not all disc galaxies have bars.
What this second fact means is that perhaps what I should really be doing is studying the galaxies you have identified as not having bars to figure out why it is they haven’t been able to form a bar yet. It should really be the properties of these which are unexpected….. We find that this is more likely to happen in blue, intermediate mass spirals with a significant reservoir of atomic hydrogen (the raw material for future star formation). In fact this last thing may be the most significant. Including realistic interstellar gas in computer simulation of galaxies is very difficult, but people do run what is called “smooth particle hydrodynamic” simulations (basically making “particles” of gas and inserting the appropriate properties). If they add too much gas into these simulations they find that bar formation is either very delayed, or doesn’t happen in the time of the simulation…..
Anyway I hope this has given you a flavour of what I find interesting about bars in galaxies. I think it’s fascinating that they give us a morphological way to identify a process which is so dynamical in nature. And it’s a very complex process, even though the basic physics (just orbits of stars) is very simple and well understood. Finally, I have become convinced though tests of the bars identified by you in Galaxy Zoo compared to bars identified by other methods, that if you want a clean sample of very large bars in galaxies that multiple independent human eyes will give you the best result. You are much less easy to trick that automated methods for finding galactic bars.
So thanks again for the classifications, and keep clicking. 🙂