Strong and weak bars in Galaxy Zoo

Good morning everyone,

My name is Tobias and I’m a new PhD student here at Oxford. I use the classifications everyone made in Galaxy Zoo to attempt to understand how galaxies evolve. Right now, I’m especially interested how bars affect galaxy evolution.

As some of you know, Galaxy Zoo currently asks to differentiate between so-called ‘strong’ or ‘weak’ bars. Below you can find some neat examples of both classes of galaxies that were identified using your classifications. It seems that the difference between strong and weak bars is some sort of combination between the length, width and brightness of the bar. 

Examples of strongly barred (top row) and weakly barred (bottom row) galaxies.

The relationship between bars and galaxy evolution has been studied before by members of the Galaxy Zoo team, but the previous incarnation of Galaxy Zoo only allowed binary answers to the bar question: either there was a bar or not. The interesting bit, however, is to see whether strong and weak bars have different effects.

In fact, we have exciting preliminary data that suggests both types do behave differently in the context of galaxy evolution! When a galaxy evolves and moves from the ‘blue cloud’ to the ‘red sequence’ in the colour-magnitude diagram, its morphology and properties change (e.g.: its star formation rate decreases). This process is called ‘galaxy quenching’. With the new Galaxy Zoo data and the classifications that everyone involved made, we saw that galaxies with weak bars are found in both the blue cloud and the red sequence, whereas the strongly barred galaxies are very much clustered in the red sequence, as you can see below. In more detail, strongly barred galaxies only make up ~5% of the blue cloud, while making up ~16% of the red sequence. To contrast this, weakly barred galaxies have a much more modest increase, populating ~17% and ~21% of the blue cloud and red sequence, respectively.

Contour plot of the colour-magnitude diagram for all the galaxies in Galaxy Zoo. Overlaid on top are the strongly barred galaxies (in green) and the weakly barred galaxies (in orange). The dotted line (taken from Masters et al. (2010)) defines ‘the blue edge of the red sequence’ and effectively divides the sample in two populations: the blue cloud and red sequence. One can clearly see that the strong bars are mainly above the dotted line.

This finding hints at a fundamental difference between the two types of bars, but in order to do real science we need to interpret the clustering of the strong bars correctly. Do strong bars cause the galaxy to quench and move up the red sequence or can a strong bar only form if the galaxy is already sufficiently quenched – a chicken or egg question on the scale of galaxies.

Before I end this post, I want to emphasise that this research is only made possible because of many volunteers, like yourself, that help classify galaxies and we are very grateful for your time and effort. However, this is only the start and a lot of work still needs to be done, so keep on classifying!

I hope to report on interesting new developments soon.


Galaxy Zoo Human + AI Paper Published

Hi all, Mike here.

A few months back, I introduced our new AI that can work together with volunteers to classify galaxies. It’s able to understand which galaxies, if classified by you, would best help it to learn. You and the AI have together classified tens of thousands of galaxies since we launched the new system in May.

I’m really happy to say that our paper was recently accepted for publication in the Monthly Notices of the Royal Astronomical Society!

We’ve made a few changes since the early version I shared before. I think the most interesting change is a new section applying AI fairness tools. These tools are usually used to check if AI models make biased decisions – for example, offering less jobs to women. We used these tools to check if our model is biased against galaxies with certain physical properties (it isn’t).

You can read the latest pre-print of the paper for free here. The (essentially identical) final publication will be also available for free from Monthly Notices once published – we’ll update this post when that happens.

Happy classifying,


The clumpiness of EAGLE galaxies

We have added new galaxies from the EAGLE simulations for you to classify on To find out more about what to do if some of them appear clumpy read this blog post.

It’s important to note that while EAGLE produces some impressive galaxy images, there are still some ways in which they don’t quite resemble real galaxies. A prominent example of this is in how many star-formation “clumps” there are in galaxies. Stars form in clumps or clusters of varying size, and some observed galaxies are clumpy in appearance, so the models are reproducing a real phenomenon. It also seems that these galaxies are more common in the early Universe, and are an important part of galaxy evolution. However, the clumpy galaxies may be too common within EAGLE.


Some EAGLE galaxies that appear clumpy. Clumps appear bright blue, because they have formed recently and contain the hottest and brightest (but shortest-lived) stars. From left to right, you can see clumpy galaxies that may appear disk-like, rounded or more chaotic in shape.

We have an understanding of why this happens: clumps can result from the limited detail with which galaxies can be modelled (even in the most powerful supercomputers), and the simplifications that need to be made to how gas interacts. This doesn’t affect other things we can learn from classifying these images. If you come across a galaxy that looks super-clumpy like the above images, the best thing to do is just ignore the clumpiness and classify the rest of the galaxy (If you would like to learn more about clumps, read about our sister project Galaxy Zoo: Clump Scout).

An EAGLE eye on galaxy formation

We have added new galaxies from the EAGLE simulations for you to classify on To find out more about why we need your help with this task please read this blog post.

Modern telescopes allow us to marvel at the diverse galaxies scattered through the vast expanse of space. Each galaxy appears unique, but many share common features with others billions of light years away. These stunning images pose some fundamental questions. How did these galaxies come to be? What will happen to them? What does their appearance tell us about their past?  We know it takes a very long time to build a galaxy; most of the nearby galaxies have been evolving for over 10 billion years. While galaxy evolution is exciting, we can hardly sit and wait for a galaxy to evolve in front of our eyes! Instead, the remarkably realistic simulated universes that are now being generated with modern supercomputers could hold the key to answering some of these questions. We are excited to announce a new image set of simulated galaxies from the EAGLE project. With your help, this will let us track how individual galaxies take their shape in a sophisticated simulated universe. 


Figure 1: The Hubble telescope peers back in time to reveal a diversity of galaxies, each one at an instant in their evolution. The Hubble ultra-deep field, reproduced courtesy of NASA, ESA, S. Beckwith (STScI) and the HUDF Team.

Computer models are increasingly powerful tools in astronomy, providing a tantalising glimpse into how galaxies evolve. We can follow the formation of galaxies in a simulated universe, once we include relevant processes such as the formation of stars, the growth of black holes and supernova explosions. The EAGLE project is a modern example of this, produced by a large international collaboration. EAGLE was run on a supercomputer using 4000 computer processors simultaneously over 4 months to generate a model universe. EAGLE is one of the most detailed model universes to date, and, along with the Illustris project, represents a historical advance in understanding various aspects of galaxy formation theory. This allows us to go through the 14 billion year history of the Universe in record time; from minuscule variations in the temperature of the first light of the Universe, to the emergence of the galaxies we see today. These detailed simulated galaxies have complex structure, particularly for galaxies as massive as our Milky Way. In EAGLE we can follow each galaxy’s complex family tree, providing a model for the direct evolution of individual galaxies. The EAGLE researchers ‘light up’ these simulated galaxies by modelling how stars shine, and how their light is obscured by dust. 


Figure 2: Simulated images of  EAGLE galaxies, showing the ancestors of three recognisable galaxy types we see in the local universe; a spiral, a lenticular, and an elliptical. The gradual evolution of their forms over time encodes information about the elusive physical processes that shape galaxies. Images are made to emulate data from the Sloan Digital Sky Survey, allowing direct comparison with real galaxies.

To enable these simulated galaxy images to tell us more about the galaxies in our real Universe, we can harness the power of Galaxy Zoo. Collecting Galaxy Zoo classifications of the EAGLE galaxies will help our understanding of how the physical properties of galaxies translate to what we see through our telescopes. What’s more, by classifying simulated galaxies at different stages of their lives, we get an idea of how each galaxy took its shape, and insights into what physical processes are working behind the scenes. Could an unassuming elliptical galaxy be the faded remnant of a once grand spiral? Or even a relic from a catastrophic collision between galaxies? By following the evolution of galaxies in EAGLE, we may find this out. 

For this experiment, we make images of all the simulated galaxies that have as many stars as the Milky Way or more at EAGLE’s ‘present day’ (14 billion years after the Big Bang) and use their galaxy family trees to take snapshots of the galaxy throughout its life, back to when the Universe was less than half its age. We make these images appear in the same way as those from the Sloan Digital Sky Survey (SDSS), which will let us compare directly to real data. Example images for three present-day galaxies can be seen in the figure above. An important aspect of the experiment is that some galaxies taken from the early EAGLE universe, which we would struggle to detect even with our most powerful telescopes, are shown as if they were local galaxies. These can take more unusual or chaotic forms. Classifying these galaxies under the same conditions as their descendents will give exciting new insight into why galaxies appear the way they do, and how they took their shape. 

This is not the first time Galaxy Zoo has classified galaxies from a simulated universe: you may remember classifying images from the Illustris project, which produced valuable insight into both the models and our real Universe at the present day (see Hugh’s previous blog). We are optimistic that the different imaging techniques and inclusion of dust effects in these new images will improve the resemblance between real and simulated galaxies, and the new approach of looking at galaxies through cosmic time will lead to new discoveries.

Most of the galaxies you see on Galaxy Zoo will continue to come from our survey of the Southern sky, but EAGLE galaxies will appear no more than 20% of the time. Your classifications of these images will help scientists tremendously in understanding the evolution of galaxies. Computer experiments are the closest thing we have to a laboratory where we can test our theories of how galaxies form, and, thanks to the Galaxy Zoo, everyone can play a part! 

Introducing Galaxy Zoo: Clump Scout, a new citizen science project

Hi, I’m Nico. I’m a 2nd year PhD student at the University of Minnesota studying galaxies. In particular, I use statistics and machine learning to extract useful information from ever-growing galaxy catalogs astronomers have assembled over the last few decades.

Today, I get to announce a completely new project by the Galaxy Zoo team! 

Galaxy Zoo: Clump Scout is a citizen science project that will take a closer look at galaxies that were classified in the Galaxy Zoo 2 project. In that project, many of you answered questions for us about their shape, structure and properties. This time we’ll be examining them in an even more detailed way.

We are searching galaxies to find “giant star-forming clumps”, or just “clumps” for short. This is what astronomers call small regions within galaxies where stars are being born at a faster-than-usual rate. They are called “giant” in comparison to any individual star or group of stars — clumps can contain millions or even billions of stars — but they’re usually quite tiny compared to the galaxy containing them. The new stars formed in clumps are brighter and more densely packed than those in the rest of the galaxy, so when photographed, clumps tend to look like small glowing areas that stand out from the background. We call any galaxy with a region like this a “clumpy galaxy”. (And yes, we promise that the word “clump” will start to sound less silly with time.)


Figure 1: Some examples of clumpy galaxies that will appear in Galaxy Zoo: Clump Scout. In these images, clumps look like small, blue spots on the galaxies. Some of the clumps in these images are bright and obvious, while others take a bit more care to spot. All photos were taken by the Sloan Digital Sky Survey.

In the Clump Scout project, we are asking volunteers to look at galaxies and click on all the clumps they can see. This is a straightforward task, but many clumps require a keen eye to pick out. Once complete, your clicks will tell us where clumps are found in thousands of galaxies in the local universe. This will be one of the first large-scale studies of clumps in local galaxies, and I’m very excited to see what we find!

pasted image 0

Figure 2: A classification from Galaxy Zoo: Clump Scout. Here, a red icon marks the central bulge of the galaxy, while six green icons mark clumps.

Why study clumps?

Clumpy galaxies have been a bit of a mystery for scientists for a while now. Astronomers have known of their existence for decades, but discussion about them really began in the late 1990s when the Hubble telescope began to capture images of very distant galaxies. Because light takes time to travel, we saw these distant galaxies as they existed billions of years ago, at a time when the universe was still young. As we studied Hubble’s images, we started to notice differences between the early galaxies and galaxies that exist today. One such difference: In the past, nearly ALL galaxies were clumpy! Discovering this was surprising, because most galaxies in the present-day universe do not have any clumps.

It’s not yet clear how clumps were formed, why they are vanishing over time, or exactly what fraction of galaxies contain clumps. What we do know is that clumps seem to change through time alongside the galaxies that contain them. As we come to better understand clumps, we hope to better understand the role they play in the growth and evolution of their host galaxies.

Why citizen science?

Part of the reason why Clump Scout is so exciting is that this is the first time human eyes will examine so many clumpy galaxies first-hand. Thanks to the help of citizen scientists, the Clump Scout project will be able to examine over fifty thousand galaxies. To speed things along, we have already filtered these galaxies with volunteer classifications from the Galaxy Zoo 2 project and picked out the subjects that volunteers marked as having “features”. By doing this, we eliminated nearly 200,000 galaxies that are very unlikely to contain clumps, leaving only more promising subjects.

We will also be testing to see which types of clumps volunteers are able to spot. There are certain clumps that are too faint to be seen no matter where they are, while others reside in bright regions of the galaxy which drown out their signal. To quantify these effects, we have taken some galaxy images and added a few of our own, simulated clumps on top. By marking these simulated clumps, you will provide us with a wealth of information about what types of clumps we can reasonably expect to find. For example, if volunteers mark a particular simulated clump 100% of the time, it is a good sign to us that a real clump like it would be found as well. On the other hand, if no volunteers see a simulated clump, we know that similar clumps are very unlikely to be found by this project.

pasted image 0-2

Figure 3: An example galaxy before and after simulated clumps were added to it. On the right, a total of 5 extra clumps have been added, but several are too faint to be seen in this image.

Why can’t computers do this?

As with many citizen science tasks, identifying clumps is fairly easy for humans to do, but difficult for computers. There have actually been a few algorithms so far that could identify clumps with some success, but it’s an exceptionally difficult task to get right. Computers must be trained to ignore all the extraneous details in an image — including background galaxies, stars in our own galaxy, and galactic features like the central bulge — to find clumps among the competing signals. Luckily, this sort of task is second nature for human beings.

Computers also tend to be very bad at finding objects they aren’t specifically instructed to find. We hope that as this project proceeds, you’ll be able to help point out some exceptionally strange clumps, or even some features we do not expect at all. It was the keen eyes of Galaxy Zoo volunteers that led to the discovery of Green Peas, a class of galaxy that is still being researched today.

This project has been in the works for the last few years, and we’re very excited to see it launch. If you’d like to try it out, you can take part here.

Galaxy Zoo + Galaxy Zoo: 3D

Hi! I’m Tom, and I’m a PhD student at the University of Nottingham, doing some research to try to understand how spiral galaxies have grown and changed over their lifetimes. I’m especially interested in looking at how the spiral arms have been affecting the galaxy as a whole. I’ve recently finished up a paper in MNRAS in which I’ve been demonstrating a couple of new methods using some Galaxy Zoo data.

Amelia has already written [ ] about how she is using the MaNGA survey [ ] to try to understand what’s happening in bars, so I won’t go into too much detail about this fantastic survey. I’ll just say that it’s part of the Sloan Digital Sky Survey, and for each of its sample of 10,000 galaxies, we have measurements of the spectrum at every position across the face of the galaxy.

MaNGA is really useful for trying to understand how galaxies have grown to their current size, because it is possible to get some sort of estimation of what kinds of stars are present in different locations of the galaxy. It’s a difficult thing to measure, so we can’t say exactly how many of every different type of star is present, but we can at least get a broad picture of the kinds of stellar ages and chemical enrichment (“metallicity”) in the stars. Astronomers have used these kinds of tools to measure the average age or metallicity of stars in different parts of galaxies, and found that in most spirals, the further out you go in the galaxy, the younger the stars are on average. The usual interpretation of this is that bulges tend to have formed first, and the disks have grown in size over time afterwards.

A MaNGA spiral galaxy. We can obtain information about the kinds of stars residing across the hexagonal area, which helps us understand how they’ve grown and evolved.

I’m really interested in trying to push this picture in two ways. Firstly, I’ve been trying to see what we can learn from looking at the general distribution of stars of different ages and metallicities – not just the average properties – at each location in the galaxy. Secondly, I think there is a lot of information that we risk ignoring by only looking at how things change with galactic radius. Spiral arms and the bar aren’t evenly distributed around the galaxy, so if we can see how the stellar properties change as we move around the galaxy, we should be able to measure what effect the spiral arms and bars have on the stars. The goal would be to try to confirm whether the most popular models of the nature of spiral arms and bars are correct or not.

To properly do this, we need to know exactly where the spiral arms and bars are in the MaNGA galaxies, so that we can see how the stars vary in these different regions. Enter Galaxy Zoo: 3D, where volunteers are asked to tell us where the different components are.

An example galaxy in MaNGA, where we’ve managed to split the galaxy into different stellar populations of different ages. Each frame shows where we find stars of a given age in this galaxy, starting from the oldest stars and finishing with the most recently formed stars. The colour denotes the mean metallicity of the stars, shown by the scale at the bottom.

All of this is what my most recent publication is about (read it in full at; we’ve shown that by combining the full spatial information available from MaNGA (augmented by Galaxy Zoo:3D) with the full distributions of the ages and metallicities of stars in each location, we can start to see some interesting things in the bar and spiral arms. It’s definitely best illustrated by an animation.

By splitting the age distributions up into different “time-slices”, we can create images of where stars of different ages are located in each of our MaNGA galaxies. Immediately from this one example, it’s obvious that there’s a lot of things going on here.

There are a few features in the animation that we’re not entirely convinced are real, but the main exciting things are that the spiral arms only show up in the youngest stars, and the bar grows and rotates as we move from older to younger stars. The growth of the bar is intriguing; this might be showing us how it formed. The bar changing with angle is even more exciting, and we think it shows us how quickly new-born stars become mixed and “locked” into the bar. The arms show what we should expect; spiral arms are areas of intense star formation, but over time the stars formed there will become mixed around the disk. We measured this effect by looking at what fraction of stars of each age are located in the volunteer-drawn spiral arms from Galaxy Zoo:3D.

This is really interesting, and highlights the power of combining large surveys like MaNGA with crowd-sourced information from the Zooniverse.

The next step is to do these kinds of things with more than just this one galaxy though. I’ve started looking at how these techniques can measure how fast the disks of spiral galaxies grew, using a large sample of spiral galaxies identified by Galaxy Zoo 2 volunteers. I’m also trying to measure how quickly stars get mixed away from spiral arms in different types of spiral galaxies. I have started to find some hints of some exciting results on both of these topics, which I would love to share in a future blog post if you’re interested.

We need volunteers to tell us where the spiral arms and bars are in galaxies, so that we can start to see what makes these regions special.

However, I’m currently limited in the number of galaxies with spiral arm regions identified by Galaxy Zoo:3D volunteers, so it would be really helpful if we could get some more! Understanding what makes spiral structure appear in disky galaxies is one of the unsolved problems in galaxy evolution and formation, and the clues to finding out might well lie in measuring how spiral arms affect the galaxy’s stars. Galaxy Zoo:3D will definitely be able to play a role in this! Help us out at

Galaxy Zoo Mobile

Hi, I’m Lauren, a summer work experience student working with the Galaxy Zoo team at the University of Oxford for a couple of weeks, and it’s my pleasure to be able to bring you some fantastic news. Today, we’re launching the mobile version of Galaxy Zoo! Unlike the website version, this brand-new native mobile version has  questions with only two possible answers – just swipe left or right depending on your answer! This can create a more captivating and faster-paced experience when you are classifying galaxies.

Not only does this introduce a new and engaging platform for the project, but it also means that you can classify galaxies anywhere – on the bus, at the beach, at a concert, in the waiting room at the dentist etc. Hopefully, this will mean many more galaxy classifications whilst also providing easier access for our wide range of volunteers across the world. By introducing this app, we hope to inspire others to join our Galaxy Zoo team, no matter their qualifications or skill set.

Get involved by downloading the Zooniverse app (if you don’t have it already), heading over to ‘Space’ section, and selecting the ‘Galaxy Zoo Mobile’ project. From there, you will be greeted with three different workflows – ‘Smooth or Featured’, ‘Spiral Arms’ or ‘Merging/Disturbed’. Pick whichever you like! The simple, swiping interface allows you to classify galaxies much faster than ever before, meaning the Galaxy Zoo science team can produce results even quicker. So, download the Zooniverse app today and start classifying!

Apple App Store:

Google Play Store:

Happy classifying,

Lauren & the Galaxy Zoo Team


Supermassive Black Holes in Merging Galaxies

The following is a blog by Yjan Gordon (@YjanGordon), a postdoc at the University of Manitoba, Canada (having recently completed a PhD at the University of Hull). Here, he describes his new paper making use of the latest Galaxy Zoo classifications.

One of the key questions I look to address in my research is that of why the black holes at the centres of some galaxies are actively feeding on matter (an active galactic nucleus, AGN for short) and why some aren’t. We know of multiple mechanisms that can trigger an AGN, from high-impact galaxy mergers to secular processes such as feeding on the matter ejected from stars over the course of their lives. However, not all AGN are created equal, and many of these objects, whilst active, are only barely so. While more powerful AGN are having a steak dinner, these weaker variants are merely snacking.

The processes that initiate these weak AGN may be different to those that fuel their more powerful cousins or simply a scaled down version of the same mechanisms. For example, we know that the collision of two similar sized galaxies (known as a major merger) can trigger an AGN. Then a minor merger, where a small galaxy collides with a much more massive one, may provide less fuel for an AGN, resulting in one of these weak AGN. This is exactly the question we investigate in our latest paper.

In order to test whether minor mergers are a factor in triggering weak AGN, high quality, deep observations are needed to look for very faint merger signatures in a sample of these galaxies. To conduct our analysis we made use of the Dark Energy Camera Legacy Survey (DECaLS). This survey not only provides the deep, high quality imaging necessary for looking for minor galactic mergers (and is far improved in this regard than previous wide-field imaging surveys, see figure below), but is also the latest survey being put to the galaxy zoo volunteers to obtain reliable galaxy morphologies.

Comparison of imaging from the Sloan Digital Sky Survey (SDSS, top) with higher quality imaging from DECaLS (bottom). The DECaLS imaging is approximately two magnitudes deeper than the SDSS imaging and shows faint merger remnants not visible in the SDSS images.

A control sample of galaxies that don’t host an AGN is required, so that we can compare the fractions of weak AGN and non-AGN experiencing mergers, i.e. are mergers more frequently associated with these AGN or not? In order to control against other variables that could impact your results here, reliable morphological information is a valuable asset. For instance, spiral galaxies have a delicate structure that can be disrupted by galaxy mergers, and the presence of this morphology in a merging system can provide information about the scale or timeline of the event. One can hence see the potential for elliptical galaxies to be more likely to exhibit the tidal disturbances than their more delicate spiral counterparts.

This kind of project wouldn’t be possible without the contributions of the many Galaxy Zoo volunteers providing morphological classifications on hundreds of thousands of galaxies.

When we compare the merger rates and the merger scales in both the weak AGN and the non-AGN control sample we found a couple of compelling results.

Firstly, we found that the fraction of both these samples experiencing minor mergers was about the same. This is interesting as it shows that minor mergers, which had long thought to be a potential trigger for these weak AGN, are not involved initiating weak activity of the central black hole in a galaxy.

Secondly, we found that for the least massive of these weak AGN, major mergers were significantly more common than in non-AGN. This is an unexpected result, as such major mergers might provide so much gas that any resulting AGN might be expected to be fairly powerful. Furthermore, previous research hadn’t shown any substantial evidence of this, so why are we seeing such an effect? Well, whilst major mergers are more common in these weak AGN, they still only represent a minority of the weak AGN population (~10%), and are thus not typical of the main population of weak AGN. One intriguing possibility is that these particular objects may actually be the early stages of more powerful AGN, and that as the merger progresses, and more gas falls into the galactic nucleus, the AGN will have more fuel to feed on and become a more powerful AGN. Further research is required to investigate such a hypothesis.

This kind of project wouldn’t be possible without the contributions of the many Galaxy Zoo volunteers providing morphological classifications on hundreds of thousands of galaxies. In this case, as is so frequent in research, not only have we answered a question about the evolution of these galaxies, but we have been presented with another.

Please keep up the great work, it really makes a difference.

Spectracular Performance!

During the past 10 years Galaxy Zoo volunteers have done amazing work helping to classify the visual appearance (or “morphology”) of distant galaxies, which has enabled fantastic science that wouldn’t have been possible without your help. 

Morphology alone encodes a wealth information about the physical processes that drive the formation and ongoing evolution of galaxies, but we can learn even more if we analyze the spectrum of light they emit.

For the 100th Zooniverse project we designed the Galaxy Nurseries project to get your help analyzing galaxy spectra obtained by the Hubble Space Telescope (you can find many more details about Galaxy Nurseries on the main project research pages and this previous blog post).

If you participated in Galaxy Nurseries, then the data you analyzed were generated using a technique called slitless spectroscopy. In slitless spectroscopy all the light entering the HST aperture is dispersed (or split) into its separate frequencies before being projected directly into the telescope’s camera. Figure 1 illustrates a typically confusing result!


Figure 1: Example of data obtained by the Hubble Space Telescope using slitless spectroscopy.

Each bright horizontal streak in the image shown in Figure 1 is actually the spectrum of a different galaxy or star. Analyzing these data can be very tricky, especially when nearby galaxy spectra overlap and cross-contaminate each other. Automatic algorithms really struggle to reliably distinguish between spectral contamination and scientifically interesting features that are present in the spectra. This means that scientists almost aways need to visually inspect any features that are automatically detected in order to ensure that they are really there!

In Galaxy Nurseries, we asked volunteers to help with this verification process. We asked you to double-check over 27,000 automatically detected emission lines in galaxy spectra obtained by the WISP galaxy survey, labelling them as either real or fake. Even for professional astronomers and experienced Galaxy Zoo volunteers, verifying the presence of emission lines in slitless spectroscopic data can be very difficult. To help you discriminate between real and fake emission lines we showed you three different views of the data. Figure 2 shows an example of one of the Galaxy Nurseries subject images.


Figure 2: A Galaxy Nurseries subject showing a real emission line. The different panels show A) a 1-dimensional representation of the spectrum with the potential emission line marked ; B) a 2-dimensional “cutout” from the full slitless spectroscopic image, with the potential emission line and the expected extent of the galaxy spectrum marked; C) a direct image of the galaxy for which the spectrum was generated.

As well as the 1 dimensional spectrum shown in Figure 2 (Panel A), we also showed a “cutout” from the full slitless spectroscopic image, which isolated the target spectrum (Panel B), and a direct image of the galaxy that produced the spectrum (Panel C). The cutout in Panel B can be really useful for identifying contamination from adjacent spectra. For example, something that looks like a feature in the target spectrum might actually originate in an adjacent spectrum and would therefore appear slightly vertically off-centre in the 2-dimensional image.

Why is the direct image useful for spectroscopic analysis? Well, emission lines often appear like very slightly blurred images of the target galaxy at a specific position in the slitless spectrum. Look again at the emission line and the direct image in Figure 2. Can you see the similarity? If the shape of the automatically detected line feature in the slitless spectroscopic image doesn’t match the shape of the galaxy in the direct image, then this can indicate that the feature is just contamination masquerading as an emission line.

The response to Galaxy Nurseries was fantastic! Following its launch the project was completed in only 40 days, gathering 414,360 classifications (that’s 15 classifications per emission line) from 3003 volunteers. Huge thanks for everyones’ help! The results of the project were published in a Research Note, and the rest of this post summarizes what we learned.

Using the labels assigned to each potential emission line by galaxy zoo volunteers we computed the fraction of volunteers who classified the line and thought it was real (hereafter freal). We wanted to compare the responses of the Galaxy Zoo volunteers with those of professional astronomers from the WISP survey team (WST). To do this, we divided the potential emission lines into two sets. The verified set contained emission lines that the WST thought were real and the vetoed set contained emission lines that the WST thought were fake. We assumed that the WST assessments were correct in the vast majority of cases, but this might not be completely accurate. Even professional astronomers make mistakes!

Figure 3 shows the distributions of freal for the two sets of emission lines. The great news is that for the vast majority of lines that the WST thought were fake, over half of the volunteers agreed with them (i.e. freal < 0.5). Similarly for most of the WST-verified set of line, the majority volunteers also labeled them as real. These results show us that Zooniverse and Galaxy Zoo volunteers are very capable when it comes to separating real emission lines from the fakes.


Figure 3: The distributions of freal for sets of emission lines that were verified (blue) or vetoed (orange) by the WISP survey team.

What can we say about the lines for which the volunteers and the WST disagreed? Is there something about them that makes them particularly hard to classify? Well, it turns out that the answer is “yes”!

We computed two statistical metrics to quantify the level of agreement between the Zooniverse volunteers and the WST for a particular sample of the emission lines that were classified.

  1. The sample purity is defined as the ratio between the number of true positives (for which both the volunteers and the WST believe the the line is real)  and the combined number of true positives and false positives (for which a feature labeled as fake by the WST was labeled as real by the volunteers). The purity tells us the fraction of lines in the sample that were labeled real by the volunteers that were also labeled as real by the WST. If volunteers don’t mislabel any fake lines as real then purity is 1.
  2. The sample completeness is the ratio between the number of true positives and combined number of true positives and true negatives (for which the WST labeled the line as real, but the volunteer consensus was that the line was fake). The completeness tells us the fraction of lines in the sample that were labeled as real by the WST that were also labeled as real by the volunteers. If volunteers spot all the real lines identified by the WST then the completeness is 1.

Figure 4 plots purity and completeness as a function of freal  and the emission line signal-to-noise ratio (S/N). Lines with higher S/N stand out more relative to the noise in the spectrum and should be easier to analyze for volunteers and the WST alike. Examining Figure 4 reveals that for subsets of candidate lines having freal less than a particular threshold value (shown on the horizontal axis), the completeness values are higher for higher S/N. This indicates that spotting real lines is much easier when the features being examined are bright, which makes intuitive sense. On the other hand, higher purities can be achieved for similar threshold values of  freal as the S/N value decreases, which indicates that volunteers are reluctant to label faint lines as real. At low S/N, sample purities as high as 0.8 can be achieved when only 50% of volunteers agreed that the corresponding emission lines were real. At higher S/N, volunteers become more confident, but also seem slightly more likely to identify noise and contaminants as real lines. This is probably a reflection of just how difficult the line identification task really is. Nonetheless, samples that are 70% pure can be selected by requiring a marginal majority of votes for real ( freal value of at least 0.6), which is pretty impressive!


Figure 4: Sample purity (left) and completeness (right) plotted as a function of minimum freal value for any potential line in the sample, and that line’s signal-to-noise ratio.

We can use the plots in Figure 4  to select samples that have desirable properties for scientific analysis. For example, if we want to be sure that we include 75% of all the real lines but we don’t mind a few fakes sneaking in, then we could choose  freal = 0.5 which would give a completeness larger than 0.75 for all S/N values. However, if we choose freal = 0.5, then the purity of our sample could be as low as 0.6 for high S/N, with about 40% of accepted lines being fake in reality.

The ability to extract very complete but impure emission line samples can be very useful. By selecting a sample that removes a sizable fraction of fakes from the automatically detected candidates, the number of potential lines that the WST need to visually inspect is dramatically reduced. It took the WST almost 5 months before each line in Galaxy Nurseries could be inspected by just two independent astronomers. By providing 15 independent classifications for each line, Zooniverse volunteers did the 8 times as much work in just 40 days! In the future, large-scale slitless spectroscopic surveys will be performed by new space telescopes like Euclid and WFIRST. These surveys will measure millions of spectra containing many millions of potential emission lines and individual science teams will simply not be able to visually inspect all of these lines. Eventually, deep learning algorithms may be able to succeed where current automatic algorithms fail. In the meantime, it is only with the help of Zooniverse and Galaxy Zoo volunteers that scientists will be able to exploit more than the tiniest fraction of the fantastic data that will soon arrive.

Enhancing Galaxy Zoo

We’ve just switched on what may be the biggest change to Galaxy Zoo since the project started more than a decade ago. In order to prepare for future surveys like Euclid and LSST which might overwhelm even the stalwart efforts of Galaxy Zoo volunteers, we’re now running an automatic classifier which works with those results from volunteers.

This machine – even when trained on the existing Galaxy Zoo results – is not perfect, and so we still need classifications from you all. Each night, the machine will learn from the day’s results, and then calculate which galaxies it thinks it most needs human help with – and if you select the ‘Enhanced’ workflow, then you’ll be much more likely to see these galaxies.

You can read more about the machine learning we’re using in a blogpost from Mike Walmsley here, and in more technical detail here. (There’s a paper available on the arXiv from this morning too). We’re also running a messaging experiment you can read about here.

We do still need volunteers to look at each and every galaxy to make sure we’re not missing anything. If you prefer to classify the old-fashioned way, then the ‘Classic’ workflow is Galaxy Zoo just as it always was.

I and the rest of the team are looking forward to seeing what we can find with this new approach – and with your help.