During the past 10 years Galaxy Zoo volunteers have done amazing work helping to classify the visual appearance (or “morphology”) of distant galaxies, which has enabled fantastic science that wouldn’t have been possible without your help.
Morphology alone encodes a wealth information about the physical processes that drive the formation and ongoing evolution of galaxies, but we can learn even more if we analyze the spectrum of light they emit.
For the 100th Zooniverse project we designed the Galaxy Nurseries project to get your help analyzing galaxy spectra obtained by the Hubble Space Telescope (you can find many more details about Galaxy Nurseries on the main project research pages and this previous blog post).
If you participated in Galaxy Nurseries, then the data you analyzed were generated using a technique called slitless spectroscopy. In slitless spectroscopy all the light entering the HST aperture is dispersed (or split) into its separate frequencies before being projected directly into the telescope’s camera. Figure 1 illustrates a typically confusing result!
Each bright horizontal streak in the image shown in Figure 1 is actually the spectrum of a different galaxy or star. Analyzing these data can be very tricky, especially when nearby galaxy spectra overlap and cross-contaminate each other. Automatic algorithms really struggle to reliably distinguish between spectral contamination and scientifically interesting features that are present in the spectra. This means that scientists almost aways need to visually inspect any features that are automatically detected in order to ensure that they are really there!
In Galaxy Nurseries, we asked volunteers to help with this verification process. We asked you to double-check over 27,000 automatically detected emission lines in galaxy spectra obtained by the WISP galaxy survey, labelling them as either real or fake. Even for professional astronomers and experienced Galaxy Zoo volunteers, verifying the presence of emission lines in slitless spectroscopic data can be very difficult. To help you discriminate between real and fake emission lines we showed you three different views of the data. Figure 2 shows an example of one of the Galaxy Nurseries subject images.
As well as the 1 dimensional spectrum shown in Figure 2 (Panel A), we also showed a “cutout” from the full slitless spectroscopic image, which isolated the target spectrum (Panel B), and a direct image of the galaxy that produced the spectrum (Panel C). The cutout in Panel B can be really useful for identifying contamination from adjacent spectra. For example, something that looks like a feature in the target spectrum might actually originate in an adjacent spectrum and would therefore appear slightly vertically off-centre in the 2-dimensional image.
Why is the direct image useful for spectroscopic analysis? Well, emission lines often appear like very slightly blurred images of the target galaxy at a specific position in the slitless spectrum. Look again at the emission line and the direct image in Figure 2. Can you see the similarity? If the shape of the automatically detected line feature in the slitless spectroscopic image doesn’t match the shape of the galaxy in the direct image, then this can indicate that the feature is just contamination masquerading as an emission line.
The response to Galaxy Nurseries was fantastic! Following its launch the project was completed in only 40 days, gathering 414,360 classifications (that’s 15 classifications per emission line) from 3003 volunteers. Huge thanks for everyones’ help! The results of the project were published in a Research Note, and the rest of this post summarizes what we learned.
Using the labels assigned to each potential emission line by galaxy zoo volunteers we computed the fraction of volunteers who classified the line and thought it was real (hereafter freal). We wanted to compare the responses of the Galaxy Zoo volunteers with those of professional astronomers from the WISP survey team (WST). To do this, we divided the potential emission lines into two sets. The verified set contained emission lines that the WST thought were real and the vetoed set contained emission lines that the WST thought were fake. We assumed that the WST assessments were correct in the vast majority of cases, but this might not be completely accurate. Even professional astronomers make mistakes!
Figure 3 shows the distributions of freal for the two sets of emission lines. The great news is that for the vast majority of lines that the WST thought were fake, over half of the volunteers agreed with them (i.e. freal < 0.5). Similarly for most of the WST-verified set of line, the majority volunteers also labeled them as real. These results show us that Zooniverse and Galaxy Zoo volunteers are very capable when it comes to separating real emission lines from the fakes.
What can we say about the lines for which the volunteers and the WST disagreed? Is there something about them that makes them particularly hard to classify? Well, it turns out that the answer is “yes”!
We computed two statistical metrics to quantify the level of agreement between the Zooniverse volunteers and the WST for a particular sample of the emission lines that were classified.
- The sample purity is defined as the ratio between the number of true positives (for which both the volunteers and the WST believe the the line is real) and the combined number of true positives and false positives (for which a feature labeled as fake by the WST was labeled as real by the volunteers). The purity tells us the fraction of lines in the sample that were labeled real by the volunteers that were also labeled as real by the WST. If volunteers don’t mislabel any fake lines as real then purity is 1.
- The sample completeness is the ratio between the number of true positives and combined number of true positives and true negatives (for which the WST labeled the line as real, but the volunteer consensus was that the line was fake). The completeness tells us the fraction of lines in the sample that were labeled as real by the WST that were also labeled as real by the volunteers. If volunteers spot all the real lines identified by the WST then the completeness is 1.
Figure 4 plots purity and completeness as a function of freal and the emission line signal-to-noise ratio (S/N). Lines with higher S/N stand out more relative to the noise in the spectrum and should be easier to analyze for volunteers and the WST alike. Examining Figure 4 reveals that for subsets of candidate lines having freal less than a particular threshold value (shown on the horizontal axis), the completeness values are higher for higher S/N. This indicates that spotting real lines is much easier when the features being examined are bright, which makes intuitive sense. On the other hand, higher purities can be achieved for similar threshold values of freal as the S/N value decreases, which indicates that volunteers are reluctant to label faint lines as real. At low S/N, sample purities as high as 0.8 can be achieved when only 50% of volunteers agreed that the corresponding emission lines were real. At higher S/N, volunteers become more confident, but also seem slightly more likely to identify noise and contaminants as real lines. This is probably a reflection of just how difficult the line identification task really is. Nonetheless, samples that are 70% pure can be selected by requiring a marginal majority of votes for real ( freal value of at least 0.6), which is pretty impressive!
We can use the plots in Figure 4 to select samples that have desirable properties for scientific analysis. For example, if we want to be sure that we include 75% of all the real lines but we don’t mind a few fakes sneaking in, then we could choose freal = 0.5 which would give a completeness larger than 0.75 for all S/N values. However, if we choose freal = 0.5, then the purity of our sample could be as low as 0.6 for high S/N, with about 40% of accepted lines being fake in reality.
The ability to extract very complete but impure emission line samples can be very useful. By selecting a sample that removes a sizable fraction of fakes from the automatically detected candidates, the number of potential lines that the WST need to visually inspect is dramatically reduced. It took the WST almost 5 months before each line in Galaxy Nurseries could be inspected by just two independent astronomers. By providing 15 independent classifications for each line, Zooniverse volunteers did the 8 times as much work in just 40 days! In the future, large-scale slitless spectroscopic surveys will be performed by new space telescopes like Euclid and WFIRST. These surveys will measure millions of spectra containing many millions of potential emission lines and individual science teams will simply not be able to visually inspect all of these lines. Eventually, deep learning algorithms may be able to succeed where current automatic algorithms fail. In the meantime, it is only with the help of Zooniverse and Galaxy Zoo volunteers that scientists will be able to exploit more than the tiniest fraction of the fantastic data that will soon arrive.
We’re excited to announce the publication of another scientific study. that wouldn’t have been possible without the hard work of the Galaxy Zoo volunteers. The paper:
“Galaxy Zoo: Morphological classification of galaxy images from the Illustris simulation”
is the first Galaxy Zoo publication that examines visual morphological classifications of computer-generated galaxy images. The images were produced in collaboration with the international team of scientists who implemented and analyzed the highly sophisticated Illustris cosmological simulation (you can find many more details about Illustris on the main Illustris project website and about the Galaxy Zoo: Illustris project in this previous blog post). Illustris is designed to accurately model the evolution of our Universe from a time shortly after its birth until the present day. In the process, simulated particles of dark matter, gas, and stars aggregate and condense to form galaxy clusters that contain seemingly realistic galaxies. In our paper we wanted to test the realism of those simulated galaxies by inviting Galaxy Zoo volunteers to evaluate their morphological appearance. We wanted to know whether Illustris galaxies look like real galaxies.
But where to start looking? Well, if you’ve ever classified a galaxy on Galaxy Zoo then you must have answered a question worded something like:
Is the galaxy simply smooth and rounded, or does it have features?
This question represents one of the simplest ways to distinguish between different groups of galaxies, but its answer can reveal a lot of information about a galaxy’s history, as well as its current activity. Visible features and substructure like discs, spiral arms and bars in galaxy images often indicate sites of ongoing star formation and can provide evidence for complex dynamical processes within a galaxy. On the other hand, apparently featureless galaxies may have formed in dense environments where galaxy-galaxy interactions are more common and might act to destroy features or even prevent them from forming in the first place.
In our paper, we compared the prevalence of visible features in galaxy images that were produced using Illustris against an equivalent sample of real galaxy images that were derived from Sloan Digital Sky Survey (SDSS) observations. Some of the differences we found were surprising but quite illuminating!
Each image in Galaxy Zoo is classified by about forty volunteers and their votes for each question are aggregated to obtain a consensus. The level of agreement between volunteers can be quantified using the vote fraction for a particular response. For a particular image and question the vote fraction for a possible response is just the number of volunteers who voted for that response, divided by the total number of votes cast for that question, for that image. A concrete example that applies here is the “featured” vote fraction: the number of volunteers who classified a galaxy image as exhibiting visible features divided by the total number of votes cast for the simple question that was quoted above. Vote fractions close to zero indicate that most volunteers thought the galaxy was smooth and rounded, while vote fractions around one imply almost unanimous consensus that a galaxy has visible features.
The filled green bars in Figure 1 illustrate the distribution of this “featured” vote fraction for real galaxy images. The distribution is dominated by a peak close to zero, which means that most volunteers thought that most galaxies looked smooth and featureless. There is also a smaller peak close to one, corresponding to a population of obviously featured galaxies. In contrast, the blue line shows the “featured” vote fraction for Illustris galaxy images. The bulk of the distribution is now peaked around 0.6, which means that Illustris galaxies were generally perceived to be predominantly featured. However, there are very few Illustris galaxies that were unanimously labeled as exhibiting visible features and a substantial population of visibly smooth galaxies is also present. Overall, the Illustris galaxy images seem more feature rich, but perhaps slightly more ambiguous than their SDSS counterparts.
To try to understand the origin of the mismatch between Illustris galaxies and those in the real Universe, we separated both of the image samples into three sub-groups based on the total mass of the stars that the galaxies contain (more succinctly described as their “stellar mass”). Each of the panels in Figure 2 can be interpreted in the same way as Figure 1, except that they correspond only to the galaxies for each of the three stellar mass sub-groups. The two panels to the left are for galaxies with stellar masses less than the mass of 1000 billion suns. They look remarkably similar to Figure 1 with the SDSS and Illustris distributions matching very poorly. However, the situation changes markedly in the right-hand panel. For these extremely massive galaxies, it appears that the Illustris simulation reproduces the observed proportion of visibly featured galaxies much better, although the population of unambiguously featured galaxies is still absent.
The change in behavior with stellar-mass that we have identified might simply be an artifact of the finite resolution at which Illustris is able to simulate the Universe. Computational power is limited, so Illustris cannot accurately model the positions, interactions and evolution of every star in its simulation volume (and of course tracking individual gas atoms or dark matter particles is completely impossible!). Instead, Illustris models large groups of stars, and large accumulations of gas and dark matter as single “particles” and models the way that they interact with each other. The features that volunteers perceive in Illustris galaxy images manifest substructures formed by groups of many such particles. Simulated galaxies with larger stellar masses contain more stellar particles that enable the simulation to model finer structural details which may be necessary to emulate the appearance of real galaxies.
Studies involving automatic morphological classification of Illustris galaxy images (e.g. Bottrell et al 2017, Snyder et al 2015) have also identified a marked divergence with galaxies in the real Universe below the same 1000 billion solar mass limit that we have found. Confirmation that the visual appearance of galaxies also changes perceptibly complements a growing body of knowledge on this subject.
Dust is another constituent of galaxies that can substantially modify their appearance by absorbing bluer light that typically indicates star formation and re-emitting it at redder wavelengths. This dust reddening effect is not accounted for by the Illustris simulations and could obscure the visibility of features that are actually present in real galaxies. This means that Illustris might be modeling real galaxies better than it seems, and coupling of a dust reddening model to the simulation output might improve the correspondence between the mismatched vote fraction distributions at lower stellar masses.
As is often the case in scientific research, an unanticipated result has provided valuable insight. The results from Galaxy Zoo: Illustris will help cosmologists to improve their models as they develop the next generation of large-scale simulations of our Universe. The results also underline the ongoing potential utility for visual morphological classification of simulated galaxies. The most recent cosmological simulations, including a next-generation Illustris Simulation, address many of the shortcomings that this and other studies have revealed. Comparing their outputs with SDSS galaxy images, as well as observational data produced by other surveys, will undoubtedly yield more insights into the processes that govern the formation and evolution of galaxies. Watch this space!
A preprint of the new paper, which has been accepted by the Astrophysical Journal, can be downloaded from the arXiv.