The following is a blog by Yjan Gordon (@YjanGordon), a postdoc at the University of Manitoba, Canada (having recently completed a PhD at the University of Hull). Here, he describes his new paper making use of the latest Galaxy Zoo classifications.
One of the key questions I look to address in my research is that of why the black holes at the centres of some galaxies are actively feeding on matter (an active galactic nucleus, AGN for short) and why some aren’t. We know of multiple mechanisms that can trigger an AGN, from high-impact galaxy mergers to secular processes such as feeding on the matter ejected from stars over the course of their lives. However, not all AGN are created equal, and many of these objects, whilst active, are only barely so. While more powerful AGN are having a steak dinner, these weaker variants are merely snacking.
The processes that initiate these weak AGN may be different to those that fuel their more powerful cousins or simply a scaled down version of the same mechanisms. For example, we know that the collision of two similar sized galaxies (known as a major merger) can trigger an AGN. Then a minor merger, where a small galaxy collides with a much more massive one, may provide less fuel for an AGN, resulting in one of these weak AGN. This is exactly the question we investigate in our latest paper.
In order to test whether minor mergers are a factor in triggering weak AGN, high quality, deep observations are needed to look for very faint merger signatures in a sample of these galaxies. To conduct our analysis we made use of the Dark Energy Camera Legacy Survey (DECaLS). This survey not only provides the deep, high quality imaging necessary for looking for minor galactic mergers (and is far improved in this regard than previous wide-field imaging surveys, see figure below), but is also the latest survey being put to the galaxy zoo volunteers to obtain reliable galaxy morphologies.
A control sample of galaxies that don’t host an AGN is required, so that we can compare the fractions of weak AGN and non-AGN experiencing mergers, i.e. are mergers more frequently associated with these AGN or not? In order to control against other variables that could impact your results here, reliable morphological information is a valuable asset. For instance, spiral galaxies have a delicate structure that can be disrupted by galaxy mergers, and the presence of this morphology in a merging system can provide information about the scale or timeline of the event. One can hence see the potential for elliptical galaxies to be more likely to exhibit the tidal disturbances than their more delicate spiral counterparts.
This kind of project wouldn’t be possible without the contributions of the many Galaxy Zoo volunteers providing morphological classifications on hundreds of thousands of galaxies.
When we compare the merger rates and the merger scales in both the weak AGN and the non-AGN control sample we found a couple of compelling results.
Firstly, we found that the fraction of both these samples experiencing minor mergers was about the same. This is interesting as it shows that minor mergers, which had long thought to be a potential trigger for these weak AGN, are not involved initiating weak activity of the central black hole in a galaxy.
Secondly, we found that for the least massive of these weak AGN, major mergers were significantly more common than in non-AGN. This is an unexpected result, as such major mergers might provide so much gas that any resulting AGN might be expected to be fairly powerful. Furthermore, previous research hadn’t shown any substantial evidence of this, so why are we seeing such an effect? Well, whilst major mergers are more common in these weak AGN, they still only represent a minority of the weak AGN population (~10%), and are thus not typical of the main population of weak AGN. One intriguing possibility is that these particular objects may actually be the early stages of more powerful AGN, and that as the merger progresses, and more gas falls into the galactic nucleus, the AGN will have more fuel to feed on and become a more powerful AGN. Further research is required to investigate such a hypothesis.
This kind of project wouldn’t be possible without the contributions of the many Galaxy Zoo volunteers providing morphological classifications on hundreds of thousands of galaxies. In this case, as is so frequent in research, not only have we answered a question about the evolution of these galaxies, but we have been presented with another.
Please keep up the great work, it really makes a difference.
During the past 10 years Galaxy Zoo volunteers have done amazing work helping to classify the visual appearance (or “morphology”) of distant galaxies, which has enabled fantastic science that wouldn’t have been possible without your help.
Morphology alone encodes a wealth information about the physical processes that drive the formation and ongoing evolution of galaxies, but we can learn even more if we analyze the spectrum of light they emit.
For the 100th Zooniverse project we designed the Galaxy Nurseries project to get your help analyzing galaxy spectra obtained by the Hubble Space Telescope (you can find many more details about Galaxy Nurseries on the main project research pages and this previous blog post).
If you participated in Galaxy Nurseries, then the data you analyzed were generated using a technique called slitless spectroscopy. In slitless spectroscopy all the light entering the HST aperture is dispersed (or split) into its separate frequencies before being projected directly into the telescope’s camera. Figure 1 illustrates a typically confusing result!
Each bright horizontal streak in the image shown in Figure 1 is actually the spectrum of a different galaxy or star. Analyzing these data can be very tricky, especially when nearby galaxy spectra overlap and cross-contaminate each other. Automatic algorithms really struggle to reliably distinguish between spectral contamination and scientifically interesting features that are present in the spectra. This means that scientists almost aways need to visually inspect any features that are automatically detected in order to ensure that they are really there!
In Galaxy Nurseries, we asked volunteers to help with this verification process. We asked you to double-check over 27,000 automatically detected emission lines in galaxy spectra obtained by the WISP galaxy survey, labelling them as either real or fake. Even for professional astronomers and experienced Galaxy Zoo volunteers, verifying the presence of emission lines in slitless spectroscopic data can be very difficult. To help you discriminate between real and fake emission lines we showed you three different views of the data. Figure 2 shows an example of one of the Galaxy Nurseries subject images.
As well as the 1 dimensional spectrum shown in Figure 2 (Panel A), we also showed a “cutout” from the full slitless spectroscopic image, which isolated the target spectrum (Panel B), and a direct image of the galaxy that produced the spectrum (Panel C). The cutout in Panel B can be really useful for identifying contamination from adjacent spectra. For example, something that looks like a feature in the target spectrum might actually originate in an adjacent spectrum and would therefore appear slightly vertically off-centre in the 2-dimensional image.
Why is the direct image useful for spectroscopic analysis? Well, emission lines often appear like very slightly blurred images of the target galaxy at a specific position in the slitless spectrum. Look again at the emission line and the direct image in Figure 2. Can you see the similarity? If the shape of the automatically detected line feature in the slitless spectroscopic image doesn’t match the shape of the galaxy in the direct image, then this can indicate that the feature is just contamination masquerading as an emission line.
The response to Galaxy Nurseries was fantastic! Following its launch the project was completed in only 40 days, gathering 414,360 classifications (that’s 15 classifications per emission line) from 3003 volunteers. Huge thanks for everyones’ help! The results of the project were published in a Research Note, and the rest of this post summarizes what we learned.
Using the labels assigned to each potential emission line by galaxy zoo volunteers we computed the fraction of volunteers who classified the line and thought it was real (hereafter freal). We wanted to compare the responses of the Galaxy Zoo volunteers with those of professional astronomers from the WISP survey team (WST). To do this, we divided the potential emission lines into two sets. The verified set contained emission lines that the WST thought were real and the vetoed set contained emission lines that the WST thought were fake. We assumed that the WST assessments were correct in the vast majority of cases, but this might not be completely accurate. Even professional astronomers make mistakes!
Figure 3 shows the distributions of freal for the two sets of emission lines. The great news is that for the vast majority of lines that the WST thought were fake, over half of the volunteers agreed with them (i.e. freal < 0.5). Similarly for most of the WST-verified set of line, the majority volunteers also labeled them as real. These results show us that Zooniverse and Galaxy Zoo volunteers are very capable when it comes to separating real emission lines from the fakes.
What can we say about the lines for which the volunteers and the WST disagreed? Is there something about them that makes them particularly hard to classify? Well, it turns out that the answer is “yes”!
We computed two statistical metrics to quantify the level of agreement between the Zooniverse volunteers and the WST for a particular sample of the emission lines that were classified.
- The sample purity is defined as the ratio between the number of true positives (for which both the volunteers and the WST believe the the line is real) and the combined number of true positives and false positives (for which a feature labeled as fake by the WST was labeled as real by the volunteers). The purity tells us the fraction of lines in the sample that were labeled real by the volunteers that were also labeled as real by the WST. If volunteers don’t mislabel any fake lines as real then purity is 1.
- The sample completeness is the ratio between the number of true positives and combined number of true positives and true negatives (for which the WST labeled the line as real, but the volunteer consensus was that the line was fake). The completeness tells us the fraction of lines in the sample that were labeled as real by the WST that were also labeled as real by the volunteers. If volunteers spot all the real lines identified by the WST then the completeness is 1.
Figure 4 plots purity and completeness as a function of freal and the emission line signal-to-noise ratio (S/N). Lines with higher S/N stand out more relative to the noise in the spectrum and should be easier to analyze for volunteers and the WST alike. Examining Figure 4 reveals that for subsets of candidate lines having freal less than a particular threshold value (shown on the horizontal axis), the completeness values are higher for higher S/N. This indicates that spotting real lines is much easier when the features being examined are bright, which makes intuitive sense. On the other hand, higher purities can be achieved for similar threshold values of freal as the S/N value decreases, which indicates that volunteers are reluctant to label faint lines as real. At low S/N, sample purities as high as 0.8 can be achieved when only 50% of volunteers agreed that the corresponding emission lines were real. At higher S/N, volunteers become more confident, but also seem slightly more likely to identify noise and contaminants as real lines. This is probably a reflection of just how difficult the line identification task really is. Nonetheless, samples that are 70% pure can be selected by requiring a marginal majority of votes for real ( freal value of at least 0.6), which is pretty impressive!
We can use the plots in Figure 4 to select samples that have desirable properties for scientific analysis. For example, if we want to be sure that we include 75% of all the real lines but we don’t mind a few fakes sneaking in, then we could choose freal = 0.5 which would give a completeness larger than 0.75 for all S/N values. However, if we choose freal = 0.5, then the purity of our sample could be as low as 0.6 for high S/N, with about 40% of accepted lines being fake in reality.
The ability to extract very complete but impure emission line samples can be very useful. By selecting a sample that removes a sizable fraction of fakes from the automatically detected candidates, the number of potential lines that the WST need to visually inspect is dramatically reduced. It took the WST almost 5 months before each line in Galaxy Nurseries could be inspected by just two independent astronomers. By providing 15 independent classifications for each line, Zooniverse volunteers did the 8 times as much work in just 40 days! In the future, large-scale slitless spectroscopic surveys will be performed by new space telescopes like Euclid and WFIRST. These surveys will measure millions of spectra containing many millions of potential emission lines and individual science teams will simply not be able to visually inspect all of these lines. Eventually, deep learning algorithms may be able to succeed where current automatic algorithms fail. In the meantime, it is only with the help of Zooniverse and Galaxy Zoo volunteers that scientists will be able to exploit more than the tiniest fraction of the fantastic data that will soon arrive.
We’ve just switched on what may be the biggest change to Galaxy Zoo since the project started more than a decade ago. In order to prepare for future surveys like Euclid and LSST which might overwhelm even the stalwart efforts of Galaxy Zoo volunteers, we’re now running an automatic classifier which works with those results from volunteers.
This machine – even when trained on the existing Galaxy Zoo results – is not perfect, and so we still need classifications from you all. Each night, the machine will learn from the day’s results, and then calculate which galaxies it thinks it most needs human help with – and if you select the ‘Enhanced’ workflow, then you’ll be much more likely to see these galaxies.
You can read more about the machine learning we’re using in a blogpost from Mike Walmsley here, and in more technical detail here. (There’s a paper available on the arXiv from this morning too). We’re also running a messaging experiment you can read about here.
We do still need volunteers to look at each and every galaxy to make sure we’re not missing anything. If you prefer to classify the old-fashioned way, then the ‘Classic’ workflow is Galaxy Zoo just as it always was.
I and the rest of the team are looking forward to seeing what we can find with this new approach – and with your help.
Alongside the new workflow that Galaxy Zoo has just launched (read more in this blog post: https://wp.me/p2mbJY-2tJ), we’re taking the opportunity to work once again with researchers from Ben Gurion University and Microsoft Research to run an experiment which looks at how we can communicate with volunteers. As part of this experiment volunteers classifying galaxies on the new workflow may see short messages about the new machine learning elements. Anyone seeing these messages will be given the option to withdraw from the experiment’; just select the ‘opt out’ button to avoid seeing any further messages.
After the experiment is finished we will publish a debrief blog here describing more of the details and presenting our results.
This messaging experiment has ethics approval from Ben Gurion University (reference: SISE-2019-01) and the University of Oxford (reference: R63818/RE001).
Since I joined the team in 2018, citizen scientists like you have given us over 2 million classifications for 50,000 galaxies. We rely on these classifications for our research: from spiral arm winding, to merging galaxies, to star formation – and that’s just in the last month!
We want to get as much science as possible out of every single click. Your time is valuable and we have an almost unlimited pile of galaxies to classify. To do this, we’ve spent the past year designing a system to prioritise which galaxies you see on the site – which you can choose to access via the ‘Enhanced’ workflow.
This workflow depends on a new automated galaxy classifier using machine learning – an AI, if you like. Our AI is good at classifying boring, easy galaxies very fast. You are a much better classifier, able to make sense of the most difficult galaxies and even make new discoveries like Voorwerpen, but unfortunately need to eat and sleep and so on. Our idea is to have you and the AI work together.
The AI can guess which challenging galaxies, if classified by you, would best help it to learn. Each morning, we upload around 100 of these extra-helpful galaxies. The next day, we collect the classifications and use them to teach our AI. Thanks to your classifications, our AI should improve over time. We also upload thousands of random galaxies and show each to 3 humans, to check our AI is working and to keep an eye out for anything exciting.
With this approach, we combine human skill with AI speed to classify far more galaxies and do better science. For each new survey:
- 40 humans classify the most challenging and helpful galaxies
- Each galaxy is seen by 3 humans
- The AI learns to predict well on all the simple galaxies not yet classified
What does this mean in practice? Those choosing the ‘Enhanced’ workflow will see somewhat fewer simple galaxies (like the ones on the right), and somewhat more galaxies which are diverse, interesting and unusual (like the ones on the left). You will still see both interesting and simple galaxies, and still see every galaxy if you make enough classifications.
With our new system, you’ll see somewhat more galaxies like the ones on the left, and somewhat fewer like the ones on the right.
We would love for you to join in with our upgrade, because it helps us do more science. But if you like Galaxy Zoo just the way it is, no problem – we’ve made a copy (the ‘Classic’ workflow) that still shows random galaxies, just as we always have. If you’d like to know more, check out this post for more detail or read our paper. Separately, we’re also experimenting with sending short messages – check out this post to learn more.
Myself and the Galaxy Zoo team are really excited to see what you’ll discover. Let’s get started.
I’d love to be able to take every galaxy and say something about it’s morphology. The more galaxies we label, the more specific questions we can answer. When you want to know what fraction of low-mass barred spiral galaxies host AGN, suddenly it really matters that you have a lot of labelled galaxies to divide up.
But there’s a problem: humans don’t scale. Surveys keep getting bigger, but we will always have the same number of volunteers (applying order-of-magnitude astronomer math).
We’re struggling to keep pace now. When EUCLID (2022), LSST (2023) and WFIRST (2025ish) come online, we’ll start to look silly.
To keep up, Galaxy Zoo needs an automatic classifier. Other researchers have used responses that we’ve already collected from volunteers to train classifiers. The best performing of these are convolutional neural networks (CNNs) – a type of deep learning model tailored for image recognition. But CNNs have a drawback. They don’t easily handle uncertainty.
When learning, they implicitly assume that all labels are equally confident – which is definitely not the case for Galaxy Zoo (more in the section below). And when making (regression) predictions, they only give a ‘best guess’ answer with no error bars.
In our paper, we use Bayesian CNNs for morphology classification. Our Bayesian CNNs provide two key improvements:
- They account for varying uncertainty when learning from volunteer responses
- They predict full posteriors over the morphology of each galaxy
Using our Bayesian CNN, we can learn from noisy labels and make reliable predictions (with error bars) for hundreds of millions of galaxies.
How Bayesian Convolutional Neural Networks Work
There’s two key steps to creating Bayesian CNNs.
1. Predict the parameters of a probability distribution, not the label itself
Training neural networks is much like any other fitting problem: you tweak the model to match the observations. If all the labels are equally uncertain, you can just minimise the difference between your predictions and the observed values. But for Galaxy Zoo, many labels are more confident than others. If I observe that, for some galaxy, 30% of volunteers say “barred”, my confidence in that 30% massively depends on how many people replied – was it 4 or 40?
Instead, we predict the probability that a typical volunteer will say “Bar”, and minimise how surprised we should be given the total number of volunteers who replied. This way, our model understands that errors on galaxies where many volunteers replied are worse than errors on galaxies where few volunteers replied – letting it learn from every galaxy.
2. Use Dropout to Pretend to Train Many Networks
Our model now makes probabilistic predictions. But what if we had trained a different model? It would make slightly different probabilistic predictions. We need to marginalise over the possible models we might have trained. To do this, we use dropout. Dropout turns off many random neurons in our model, permuting our network into a new one each time we make predictions.
Below, you can see our Bayesian CNN in action. Each row is a galaxy (shown to the left). In the central column, our CNN makes a single probabilistic prediction (the probability that a typical volunteer would say “Bar”). We can interpret that as a posterior for the probability that k of N volunteers would say “Bar” – shown in black. On the right, we marginalise over many CNN using dropout. Each CNN posterior (grey) is different, but we can marginalise over them to get the posterior over many CNN (green) – our Bayesian prediction.
Read more about it in the paper.
Modern surveys will image hundreds of millions of galaxies – more than we can show to volunteers. Given that, which galaxies should we classify with volunteers, and which by our Bayesian CNN?
Ideally we would only show volunteers the images that the model would find most informative. The model should be able to ask – hey, these galaxies would be really helpful to learn from– can you label them for me please? Then the humans would label them and the model would retrain. This is active learning.
In our experiments, applying active learning reduces the number of galaxies needed to reach a given performance level by up to 35-60% (See the paper).
We can use our posteriors to work out which galaxies are most informative. Remember that we use dropout to approximate training many models (see above). We show in the paper that informative galaxies are galaxies where those models confidently disagree.
This is only possible because we think about labels probabilistically and approximate training many models.
What galaxies are informative? Exactly the galaxies you would intuitively expect.
- The model strongly prefers diverse featured galaxies over ellipticals
- For identifying bars, the model prefers galaxies which are better resolved (lower redshift)
This selection is completely automatic. Indeed, I didn’t realise the lower redshift preference until I looked at the images!
I’m excited to see what science can be done as we move from morphology catalogs of hundreds of thousands of galaxies to hundreds of millions. If you’d like to know more or you have any questions, get in touch in the comments or on Twitter (@mike_w_ai, @chrislintott, @yaringal).
Excited to join in? Click here to go to Galaxy Zoo and start classifying! What could you discover?
Congratulations Radio Galaxy Zoo citizen scientists on a job well done! The Radio Galaxy Zoo 1 project has now finished with ~2.29 million classifications! Well done on helping us push towards the finish line.
We have at least two second-generation Radio Galaxy Zoo projects in the pipeline for which we hope to launch next. Therefore please stay tuned for the announcement of the Radio Galaxy Zoo 2 projects where we will be presenting you with new data from the next-generation radio telescopes.
Thank you very much again for all your support and we will continue to keep you updated on our progress in the interim.
Ivy & Stas
I’m delighted to announce the acceptance of another paper based on your classifications at Galaxy Zoo, “Galaxy Zoo: Unwinding the Winding Problem – Observations of Spiral Bulge Prominence and Arm Pitch Angles Suggest Local Spiral Galaxies are Winding”, which has just been released on the arxiv pre-print server, and appear in the Monthly Notices of the Royal Astronomical Society (MNRAS) soon.
Here’s the title and author page.
This paper has been a long time coming, and is based significantly on the excellent thesis work of Ross Hart (PhD from Nottingham University). Ross wrote about some of his work for the blog previously “How Do Spiral Arms Affect Star Formation“. One of the things Ross’s PhD work showed was just how good your identification of spiral arm winding is, and that allowed us to be confident to use it in this paper.
You might notice the appearance of some of your fellow citizen scientists in this author list. Dennis, Jean and Satoshi provided help via the “Galaxy Zoo Literature Search” call which ended up contributing significantly to the paper.
Our main result is that we do not find any significant correlation between how large the bulges are and how tightly wound the spirals are in Galaxy Zoo spiral galaxies…. this non-detection was a big surprise, because this correlation is discussed in basically all astronomy text books – it forms the basis of the spiral sequence described by Hubble.
Way back in 1927 Hubble wrote (about the spiral nebula he had observed) that: “three [properties] determine positions in the sequence: (1) the relative size of the unresolved nuclear region, (2) the extent to which the arms are unwound (the openness or angle of the spiral), (3) the degree of condensation in the arms.” He goes on to explain that “These three criteria are quite independent, but as an empirical fact of observation they develop in the same direction, and can be treated as various aspects of the same process.” (i.e. Hubble observed them to be correlated).
It’s been known for a long time that there are examples where bulge (or “unresolved nuclear region”) size and arm winding did not agree, but these are usually treated as exceptions. What we’ve shown in this paper, is that for a sample selection which goes beyond just the brightest nearby galaxies Hubble could see, the correlation is not strong at all. Below is an annotated version of our main result figure – each point is a spiral with Galaxy Zoo classifications, and the contours show where there are lots of points. We find spirals all over this plot (except not many with big bulges and loosely wound arms), and the red and blue lines show the lack of any strong trend in either direction.
This has significantly implications for how we interpret spiral winding angles, and could be explained by many/most spiral arms winding up over time (at rates which depend on the bulge size) rather than being density waves. We need to do more work to really understand what this observation tells us (which is a great place to be in science!).
We have also known for a while, that bulge size correlates best with modern expert galaxy classification on the Hubble sequence (e.g. when we compared you classifications to the largest samples done in that way). So another point we make in this paper is how different these modern classifications are to the traditional classifications done by Hubble and others. That’s OK – classifications should (and do) shift in science (part of the scientific method is to change on the basis of evidence), but it does mean care needs to be taken to be precise about what is meant by “morphology of galaxies”.
I ended the abstract of the paper with: “It is remarkable that after over 170 years of observations of spiral arms in galaxies our understanding of them remains incomplete.” and I really think that’s a good place to end. Galaxy morphology provides a rich source of data for understanding the physics of galaxies, and thanks to you we have access to the largest and most reliable set of galaxy morphologies ever.
Here is a bittersweet announcement that the current first-generation Radio Galaxy Zoo project will be retiring on the 1st May 2019. We are so grateful to have worked with such a productive team of citizen and professional scientists for the past 5.5 years.
To-date, we have made over 2.27 million classifications and published 10 refereed journal articles. We have another 1 submitted and another to be submitted in the next few weeks.
Looking towards the future, we are of course in the process of developing the next-generation of Radio Galaxy Zoo projects. For that, we ask that you stay tune for our future announcements of the suite of Radio Galaxy Zoo 2 projects that we are planning to launch. Of course, we will be keeping you all informed about our latest RGZ-based follow-up observations (e.g. the Zoo Gems programme with the Hubble Space Telescope). Therefore, this is not the last message from us.
To cap-off this impending retirement, I propose that we make a final RGZ sprint to the finish in the remaining days April 2019 –that is, let’s all try to classify as many sources as we can in the next few weeks!
Thank you very much again and let’s all make a concerted push to the finish line!
Ivy & Stas