New Paper – Practical Galaxy Morphology Tools

Last year, we published the GZ DECaLS catalog: detailed morphology classifications for 314,000 galaxies. We classified so many galaxies by training AI models to learn from volunteers and work alongside them. This raises the question – what else can we do with those models?

It turns out that we can use them to make three new practical tools that will help both professional researchers and volunteers. You can read all about them in our new paper out today:

The first practical tool is a similarity search. You can type in the coordinates of a galaxy, and it will try to show you the most similar galaxies. Try it out on your favourite DECaLS galaxy. For now, it’s a simple demo website, but we hope to eventually integrate this into Galaxy Zoo.

The second is a new method for finding the galaxies most interesting to you personally. Imagine a website where you can rate galaxies by how interesting you find them. As you rate galaxies, the website shows you new ones for you based on your previous ratings – just like how Netflix suggests new series (I’m a big Bojack fan myself). The system is too complicated to create a simple demo to show you, but you can see some examples in the new paper. Thanks to funding from the Sloan Foundation, we’re making this even better and adding it as an official Zooniverse feature.

The third is about adapting the AI models to classify new kinds of galaxies. If a researcher wants a model that can find ringed galaxies, for example, they would usually have to start by gathering tens of thousands of examples of ringed galaxies with which to teach their new model. This takes a long time and a lot of effort, especially for rarer galaxies. However, a model already trained on Galaxy Zoo classifications needs just hundreds of example galaxies to learn to find rings as well. This will let researchers “fine-tune” models to help solve their own specific science problems. That includes me! I’m running a Galaxy Zoo Mobile project to make a new ring catalogue with this approach.

All these tools work because of your classifications. As well as using them directly in science catalogues, we need them to train better AI models. Thank you for your contribution.

If you have any spare time – maybe on the bus, or just sitting around scrolling – I would really appreciate your help finding ring galaxies by swiping left and right on Galaxy Zoo Mobile, part of our Zooniverse app (Apple, Android). I’m hoping to build the biggest catalogue of rings ever assembled so we can understand how they form. Please join in if you can.



P.S. You can find a few more technical details on my personal blog.

New Galaxy Zoo Mobile challenge – Ringed Galaxies

My name is Mike – I’m a researcher helping run the Zooniverse project Galaxy Zoo

I’m launching a new challenge within Galaxy Zoo Mobile, the version of GZ that runs on our mobile app (iOS, Android, scroll down to “Space” projects).

The challenge is to find galaxies with rings. I’ve picked out the 25,000 galaxies where some* volunteers voted for “Ring” on the final GZ question – “Does this galaxy have any rare features?”. Now it’s time to do a targeted search through these promising galaxies. Swipe left and right on GZ Mobile to tell us which ones you think have rings.

This is what galaxies with rings look like. I think these are easily the most beautiful galaxies we’ve ever shown on Galaxy Zoo, with glittering spiral arms and intricate structures. We’ve zoomed in each picture about 25% more than in Galaxy Zoo itself, so you’ll see all that fine detail.

We want to find galaxies with rings because they’re a mystery. Astronomers aren’t sure what causes rings. 

One leading theory is that they form from disk galaxies left undisturbed for hundreds of millions of years. Theoretical calculations and computer simulations suggest that the gravity of stars in the galaxy’s bar or bulge can cause the orbits of nearby stars to change, first making spiral arms and eventually a ring shape. Another theory is that rings are caused by head-on collisions where a small galaxy punches through the middle of a large disk galaxy, like a rock dropped into a pond.

The truth is that there are probably different kinds of ring, formed by different processes. Working out which processes form which rings will require many examples of each – and that’s where you come in. 

This targeted project is all about finding as many rings as possible. Once we know which galaxies have rings, we can follow up with future projects to divide them into different categories, and compare those categories to find out what creates each type of ring. 

As always with Galaxy Zoo, your classifications will be publicly shared with all researchers to help everyone investigate rings. We will also use your classifications to teach a new version of Zoobot, our galaxy-classifying AI, to find rings. Zoobot can then help find more rings in the million-or-so galaxies recently released by the DECaLS survey** that we haven’t yet uploaded to Galaxy Zoo. 

If you have any questions, come chat to our community and myself on the Galaxy Zoo Talk forum



* Specifically, galaxies where the fraction of volunteers answering “ring” is in the top third (typically about two or more volunteers).

** The published catalog from Galaxy Zoo DECaLS used images from Dark Energy Camera Legacy Survey data release 5 and earlier. The survey has since released more galaxy images, some of which have already been uploaded to Galaxy Zoo.

Stronger bars help shut down star formation

Hi everyone!

I’m Tobias Géron, a PhD student at Oxford. I have been using the classifications of the Galaxy Zoo DECaLS (GZD) project to study differences between weak and strong bars in the context of galaxy evolution. We have made significant amount of progress and I was able to present some results a couple of weeks ago at a (virtual) conference in the form of a poster, which I would love to share with you here as well.

To summarise: I have been using the classifications from GZD to identify many weakly and strongly barred galaxies. Some example galaxies can be found in the first figure on the poster. As the name already implies, strong bars tend to be longer and more obvious than weak bars. But what exactly does this mean for the galaxy in which they appear?

One of the major properties of a galaxies is whether it is still forming stars. Interestingly, in Figure 2 we observe that strong bars appear much more frequently in galaxies that are not forming stars (called “quiescent galaxies”). This is not observed for the weak bars. This suggests one of two things: either the strong bar helps to shut down star formation in galaxies or it is easier to form a strong bar in a quiescent galaxy.

In an attempt to answer this chicken or egg problem, we turn to Figure 3. Here, we show that the rate of star formation in the centre of the galaxy is highest for the strongly barred galaxies that are still star forming. This suggests that those galaxies will empty their gas reservoir quicker, which is needed to make stars, and are on a fast-track to quiescence. 

I’m also incredibly happy to say that we’ve written a paper on this as well, which has recently been accepted for publication! You can currently find it here. Apart from the results described above, we also delve more deeply into whether weak and strong bars are fundamentally different physical phenomena. Feel free to check it out if you’re interested!

It’s amazing too see all this coming to fruition, but it couldn’t have been possible without the amazing efforts of our citizen scientists, so I want to thank every single volunteer for all their time and dedication. We have mentioned this in the paper too, but your efforts are individually acknowledged here. Thank you!



Clump Scout wrap-up: What are we doing with your 2.7 million clicks?

Hi all. My name is Nico Adams from the Galaxy Zoo science team.

Writing my first scientific paper has been equal parts exhausting and exhilarating. On Thursday, February 11, I got to put a tally in the “exhilarating” column. The paper is on the first scientific results covering the Galaxy Zoo: Clump Scout project, and I was putting the final touches on my first draft when I saw that you all had submitted the project’s final classifications. The Clump Scout project had a lofty goal — to search for large star-forming regions in over 50,000 galaxies from the Sloan Digital Sky Survey — and the fact that the Clump Scout volunteers have managed to finish it is an incredible achievement.

We’re looking forward to sharing our results over the next few months. Clump Scout is not only the first citizen science project to search giant clumps in galaxies, but it’s the first large-scale project of any kind to look for clumps in the “local” universe (out to redshift ~0.1, or within a billion-or-so light-years of us). The data set presented by this project is incredibly unique, and we are nearly finished with our first round of analysis on it.
We’re currently preparing two papers that will cover the results directly. One is focused on the algorithm that turned volunteers’ clicks into “clump locations”, while the other — my first paper — is focused on the clump catalog and scientific results we derived from it. While these papers go through a few months of revision and review, we wanted to publish a few blog posts previewing the results. This blog post will focus on the first one: We’ll explain what happened to your clicks after you sent them to us. Clump Scout could not have happened without our volunteers, and we thank you immensely for your support.

When we designed Clump Scout, we knew from the outset that we wanted classifications to be as simple as possible. The original plan was to have volunteers click on any clumps they saw, then immediately move on. While the final design was a bit more complex (a few different types of marks were available) that basic design — mark the clumps, then move on — was still present.

The classification interface after a volunteer submits their clump locations usually looks something like this:

By comparison, the “science dataset” — which consists of 20 volunteers’ classifications all laid on top of each other — looks more like this:

Just by glancing at this image, it’s clear that there are a few “hot spots” where clumps have been identified. However, correctly identifying these hot spots in every image can be EXTREMELY tricky to get right. The software that deals with this problem is called the “aggregator”, and it has to strike a balance between identifying as many clumps as possible and filtering out the isolated marks in the image.

The standard way of solving this problem in computer science is to use a “clustering algorithm”. Clustering algorithms are a very broad class of techniques used to identify clusters of points in space, and most of them are very simple to implement and run. Below, you can see the results of one clustering algorithm — called the “mean shift” algorithm — in practice.

Most clumps have been spotted correctly, and the results look good! However, it took quite a bit of fine-tuning and filtering to get the results to look like this. In the image above, the “bandwidth” parameter — the approximate “size” of each cluster — is about equal to the resolution of the image. Increasing the bandwidth can make the algorithm identify more clumps by grouping together clusters of points that are more diffuse. Unfortunately, the larger bandwidth also increases the likelihood that two or more “real” clumps will mistakenly be grouped into one. Here are the clusters we get when the bandwidth is twice as large:

Now that we’ve allowed clusters to be more spread-out, we’ve picked up on the cluster in the upper left. But, the three distinct clumps at the bottom edge of this galaxy have melded into just two, which is not what we want! This is just one of the parameters that we needed to tune. Another is the number of marks required to call a cluster a “clump”. Require too many, and you ignore valuable objects that we’re interested in. Require too few, and the algorithm picks up on objects that are really just noise.

How do we solve this problem? One thing that we tried was to have three members of the science team to classify 1,000 galaxies, so that we could see how their classifications agreed with each other and with volunteers’ marks. We found that when 2 out of 3 members of the science team identified a clump, a majority of volunteers identified it as well. This was a good sign, and it told us about how many volunteer marks to expect per clump. In general, if 60% of volunteers leave a mark within a few pixels of the same spot, we consider that spot to be a clump.

Another technique that we used was more radical. While we started out using the simple clustering algorithm we’ve described so far, we found that it was much more effective to account for who was leaving each mark. Every volunteer is an individual person, with their own clump-classifying habits. Some volunteers are very conservative and only click on a clump when they’re completely certain; others are optimists who want to make sure that no faint clumps get missed. Sometimes volunteers make genuine mistakes and believe it or not we even get a few spammers who just click all over the image! We wanted to design an aggregation system that would make best use of all volunteers’ skills and talents (and if possible even the spammers!) to help us find as many real clumps as possible, without accidentally including any other objects that can masquerade as clumps. 

To build our aggregation system, we started with an idea that was first proposed by Branson et al (2017). At its core, our system still uses a type of clustering algorithm, called a facility location algorithm. The facility location algorithm builds clusters of volunteer clicks that have a very specific connectivity pattern, which looks like this.

An example of the “facility location” algorithm. The blue “F”s mark proposed facilities, which are connected to red “C”s (cities). In practice, the facilities represent the true locations of clumps while the cities represent your marks identifying them.

Each cluster contains a central node, referred to as a “facility”, which is connected to one or more other nodes, referred to as “cities”. Facility location algorithms get their name because they are often used to minimise the cost of distributing some essential commodity like electricity or water from a small number of producers (the facilities) to a larger number of consumers (the cities). Building a facility incurs a cost and so does connecting a city to a facility. When we use the algorithm in our aggregator, the volunteer clicks that we want to group into clusters become the facilities and cities. The trick to finding the right clusters is how we choose to define the costs for facility creation and facility-city connection. 

The costs we use are based on a statistical model that tries to understand how different volunteers behave when they classify clumpy galaxies. For each volunteer, the model learns how likely that volunteer is to miss real clumps or accidentally click on other features in the subject images. The exact location of real clumps in an image can be ambiguous, so when the model thinks that a volunteer has clicked on a real clump, it also tries to predict how accurate their annotation is. But it isn’t just the volunteers that are unique – different subjects have different characteristics too, and it may be much more difficult to spot clumps in some galaxies than it is in others. For example, spotting bright, well separated clumps on a faint background is likely to be much easier than spotting faint closely packed clumps in a noisy image. Our aggregator model takes this into account as well by trying to understand just how difficult finding clumps is in different images.

How does the aggregator model work out how volunteers are behaving? Do we tell it the right answer for a handful of subjects and check the volunteers’ annotations against them? Actually no, because we don’t know exactly what the right answer is! One of the goals of Galaxy Zoo: Clump Scout was to let the volunteers decide together exactly what it takes for a feature to be a clump. So we don’t give our model any information except the clicks that the volunteers provide. Just by comparing how different volunteers respond to different images as the classifications arrive, and comparing their annotations with the clusters found by the facility location algorithm, our model slowly learns the combination of all volunteer behavioural traits and image difficulties that best explain the classification data it has seen.

Once our model provides its best description of the volunteers and images, we define the costs for the facility location algorithm. We specify that turning a volunteer’s click into a facility is more expensive for very optimistic volunteers, who might click on slightly more features that aren’t really clumps. This reduces the chance of accidentally contaminating the clump detections. Connecting clicks to an existing facility costs more if the volunteers that provided them seem optimistic. On the other hand, if it seems like a volunteer is more pessimistic or their clicks are slightly less accurate, then it becomes cheaper to connect their clicks into an existing cluster. This ensures that we don’t miss those hard-to-spot clumps with fewer clicks or more widely spread clicks.

But wait a minute! Were you reading carefully? Our model’s understanding of the volunteers and images is partly based on the clusters that were found, but the cost of creating the clusters depends on the volunteers’ behaviour! How does that work?! Good question. Whenever a new volunteer joins the project, we don’t know anything about them, so we make some reasonable assumptions about how they will behave. In a similar way, we assume that all subjects have roughly similar characteristics. We call these assumptions the “priors” of our model. These priors let us get started with a really rough set of clusters that our model can use to make an initial guess about the volunteers and subjects. Then we can use that guess to set some new costs and find some new, more refined clusters. With these clusters, our model can make another, better-informed prediction. Our algorithm keeps refining its guess and click-to-cluster assignments over and over again until the model predictions and the corresponding clusters don’t change any more. 

Compared to our simplest aggregator, our best results from our more advanced method is better at picking up faint clumps and filtering out noise. It’s also the first time this sort of method has been used in the pipeline of a major citizen science project like this one. This aggregator will be the subject of one of our upcoming papers on Clump Scout, and we are very excited to share the results.

A special thanks on this post goes out to the other members of the Clump Scout team, who helped ensure that the details of our aggregation process were as accurate and simply explained as possible. In the next week or two we’ll publish a second post detailing some of the scientific findings we’ve gotten from our results. Thank you, and stay tuned!

Happy Data Release Day: DECaLS goes live

I’m delighted to say that – with the release of the accompanying paper on the arXiv – the first data release from our Galaxy Zoo classifications of galaxies from the DECaLS survey is now live! The paper is still under review at the journal, but as lead author Mike Walmsley is handing in his thesis (congratulations!) it seemed like a good time to release the data.

As the title suggests, this data relies on classifications submitted by our wonderful Galaxy Zoo volunteers from 2015 to 2020, particularly via the ‘Enhanced’ workflow where classifications are used to educate a friendly robot assistant, speeding up the process dramatically. As a result, we have detailed classifications for 314,000 galaxies based on deeper imaging than we’ve ever had before.

The results are dramatic! In the figure above you can see a comparison between the fraction of votes a galaxy received for being ‘featured’ in our previous data release, compared to with the new DECaLS imaging. If the new imaging made no difference, the galaxies would all lie on the dotted line, but they’re mostly above it – volunteers are seeing more features in galaxies in deeper imaging. All of which makes sense, but it’s still gratifying.

We’re all looking forward to getting stuck into this dataset – and Mike has built a tool for you to explore with. Using this interface, you can sort through the data and look at the results – below is a quick sample of double rings Sandor cooked up in no time at all.

We’re not done by a long shot – unlike these systems, the galaxies currently awaiting your inspection over at have not been previously classified – with your help, hopefully it won’t be too long before we can add them to the catalogue. In the meantime – thanks for all your help!


P.S. Pulling this paper together was a real team effort so I want to thank each and every one of the team for their hard work getting this over the line. We haven’t forgotten the volunteers either – the final, published version will have an author list online with the names of everyone who contributed and we’ll email you all a link.

Press Release on Results from Galaxy Zoo: 3D

Many of you helped out with the Galaxy Zoo spinoff project, Galaxy Zoo: 3D. I am happy to let you know that I am presenting results from this project, today at the 237th Meeting of the American Astronomical Society. You can view the iPoster I made about it at this link.

This spin-off project was aimed at supporting the MaNGA (Mapping Nearby Galaxies at Apache Point Observatory) survey, which is part of the Sloan Digital Sky Surveys (SDSS). Thanks to your input we have been able to crowdsource maps which show where the spiral arms, bars and any foreground stars are present in every galaxy observed by MaNGA. This, combined with the MaNGA data is helping to reveal how these internal structures impact galaxies.

The results will be part of a Press Conference about this and other SDSS results, live streamed at 4.30pm ET (9.30pm GMT) on the AAS Press Office Youtube Channel. The press release about them will go live on the SDSS Press Page at the same time. Direct link to press release (will only work after 4.30pm ET).

Thanks again for your contributions to understanding how galaxies work.

A sad farewell

I recently received word from his wife of the death of Jean Tate on November 6. Jean had been a very active participant in several astronomical Zooniverse projects for a decade, beginning with Galaxy Zoo. It does no disservice to other participants to note that he was one of the people who could be called super-volunteers, carrying his participation in both organized programs and personal research to the level associated with professional scientists. He identified a set of supergiant spiral galaxies, in work which was, while in progress, only partially scooped by a professional team elsewhere, and was a noted participant in the Andromeda project census of star clusters in that galaxy. In Radio Galaxy Zoo, he was a major factor in the identification of galaxies with strong emission lines and likely giant ionized clouds (“RGZ Green”), and took the lead in finding and characterizing the very rare active galactic nuclei with giant double radio sources from a spiral galaxy (“SDRAGNs”). He did a third of the work collecting public input and selecting targets to be observed in the Gems of the Galaxy Zoos Hubble program. Several of us hope to make sure that as much as possible of his research results from these programs are published in full.

Jean consistently pushed the science team to do our best and most rigorous work. He taught himself to use some of the software tools normally employed by professional astronomers, and was a full colleague in some of the Galaxy Zoo research projects. His interests had been honed by over two decades of participation in online forum discussions in the Bad Astronomy Bulletin Board (later BAUT, then Cosmoquest forum), where his clarity of logic and range of knowledge were the bane of posters defending poorly conceived ideas.

Perhaps as a result of previous experiences as a forum moderator, Jean was unusually dedicated to as much privacy as one can preserve while being active in online fora and projects (to the point that many colleagues were unaware of his gender until now). This led to subterfuges such as being listed in NASA proposals as part of the Oxford astronomy department, on the theory that it was the nominal home of Galaxy Zoo. Jean was married for 27 years, and had family scattered in both hemispheres with whom he enjoyed fairly recent visits. Mentions in email over the years had made me aware that he had a protracted struggle with cancer, to the extent that someday his case may be eventually identifiable in medical research. He tracked his mental processes, knowing how to time research tasks in the chemotherapy cycle to use his best days for various kinds of thinking.

This last month, emails had gone unanswered long enough that some of us were beginning to worry, and the worst was eventually confirmed. I felt this again two days ago, which was the first time I did not forward notice of an upcoming Zoo Gems observation by Hubble to Jean to be sure our records matched.

Ad astra, Jean.

Radio Galaxy Zoo: LOFAR – A short update

A lot has happened on the Radio Galaxy Zoo since we last posted an update!

First of all, you can see on the image above that we are making great progress with getting all of the big, bright sources from the LOFAR survey looked at by Zooniverse volunteers. We are approaching half a million classifications and just under 80,000 radio sources have been looked at by at least five volunteers at the time of writing. Together with the earlier efforts by members of the LOFAR team, we have covered a very wide area of the sky, around 3,000 square degrees, which is well over half of the area of the LOFAR data, and are well on the way to completing the original aims of the project. The green, orange and pink areas together show the areas of the sky we have completed.

What’s next? One of the key goals of the LOFAR Radio Galaxy Zoo has always been to provide targets for the WEAVE-LOFAR spectroscopic survey. WEAVE is a new spectroscope being commissioned on the William Herschel Telescope, which can measure 1,000 redshifts of galaxies in a single observation. WEAVE-LOFAR aims to find the redshifts of every bright LOFAR source in the survey. But the survey can’t work without knowing where the optical host galaxies are — so the input of Zooniverse volunteers in selecting these host galaxies is absolutely crucial to our success.

A complication is that WEAVE wants to look at all LOFAR sources, not just the large ones that we generally select for the Zooniverse project. As regular users will know, there are many small sources in the radio sky as well, and the optical counterparts of those can be found automatically just by matching with optical catalogues. In between there are some intermediate-sized sources, and these present the biggest problem; some of them benefit from viewing by volunteers, but there are too many of them for us to look at them all. Earlier in the year we selected 10,000 of these in a particular region of the sky that we thought would benefit from human inspection using a combination of algorithms and machine learning, and injected them into the Zooniverse project to see what volunteers made of them. The results are encouraging and have allowed us to develop a process of ‘early retirement’ for sources that turn out not to be interesting (i.e. no clicks are made during classification). Our next priority is to select this type of source, informed by the first set of results, over a larger area of the sky in order to get the full set of inputs for the first year of WEAVE. You’ll see these sources entering the Zooniverse project over the coming weeks.

Presenting results from the Galaxy Builder project

From April 2018 until early this year, Galaxy Builder has collected over 18,000 models of spiral galaxies, built by volunteers. These models were combined and computationally fine-tuned, and the results have been compiled into Lingard et al. 2020 recently accepted for publication in the Astrophysical Journal.

The project asked volunteers to sequentially add components to a galaxy, starting with the galaxy’s disc, then, if one is present, a bulge and a bar, followed by tracing any visible spiral arms. At each stage, light from the corresponding component would be removed until the whole galaxy was accounted for:

Four-panel figure showing the galaxy builder interface, a spiral galaxy is visible in blue, and in each panel another component is added to gradually remove all the visible light from the galaxy.

Four-panel figure showing the galaxy builder interface, a spiral galaxy is visible in blue, and in each panel another component is added to gradually remove all the visible light from the galaxy.


After collecting 30 volunteer models for each galaxy, we then used Machine Learning techniques to cluster components, and identify a “consensus model”.

Four panel plots showing the clusterd and consensus components for an example galaxy. There is a small amount of scatter in each component, but the clustering has reliably found a good result.

Four panel plots showing the clusterd and consensus components for an example galaxy. There is a small amount of scatter in each component, but the clustering has reliably found a good result.

We then used a computer fitting algorithm to fine-tune this model, resulting in a detailed description of the galaxy’s light distribution, which we can use to understand the physical processes occurring inside it!

Five panel plot showing an image of the example galaxy, the fitted model (which matches the real galaxy very well), the difference between the galaxy and model (which is small), and how the consensus components from clustering have changed during fitting (they do not change very much)

Five panel plot showing an image of the example galaxy, the fitted model (which matches the real galaxy very well), the difference between the galaxy and model (which is small), and how the consensus components from clustering have changed during fitting (they do not change very much)

We have shown that galaxy model created in this way are just as reliable as simpler models obtained purely through computer fitting (when those simple models are appropriate!), by comparing to other published work and by incorporating a small sample of synthetic galaxies, for which we know the true light profiles:

The nine synthetic galaxy images, each of which look very realistic (but without clumpy star-forming regions). Most have spiral arms, and some have bars.

The nine synthetic galaxy images, each of which look very realistic (but without clumpy star-forming regions). Most have spiral arms, and some have bars.

For most parameters, the difference between the true (x-axis) and volunteer-provided (y-axis) values is tiny. There are some issues with bar “boxyness” and bulge concentration (Sérsic index), primarily due to the computer fitting algorithm not being able to distinguish between different combinations of values:

Scatter plots showing how well parameters are recovered. We see that the method generally does a very good job, but there is a lot of scatter in bulge sersic index and bar boxyness.

Scatter plots showing how well parameters are recovered. We see that the method generally does a very good job, but there is a lot of scatter in bulge sersic index and bar boxyness.

Thanks to the added complexity of Galaxy Builder models, we have a much richer source of information for scientists to delve into! We’re excited to share the scientific results we’ve obtained, expect another post in the not-too-distant future (hint, spiral arms are complicated)!

The research team want to send a very warm thank-you to everyone who has participated in this project over the years. Without your efforts we would not have had the chance to do the science we are so passionate about, and we are very excited for the future of the Zooniverse.

Galaxy Builder is currently finished collected classifications, but we still need your classifications in Galaxy Zoo, where we’re working on collecting classifications for images from the DECaLs survey.

This blog was posted on behalf of Tim Lingard for the Galaxy Builder Team. Tim also submitted his PhD thesis based on this work this summer, and is now working as a Data Analyst for the 1715 Labs

Galaxy Zoo: Clump Scout – a first look at the results

Hi all! Nico here, grad student from the Minnesota science team, with an update on the Galaxy Zoo: Clump Scout project.

Since launching Clump Scout in September of last year, we’ve had over 7,000 volunteers provide more than 800,000 classifications! We’re incredibly grateful for your help and we’ve been excitedly exploring the data as it has been coming in to learn more about clumpy galaxies in the local universe.

Now that we’re around the halfway point with this project, we wanted to share with you some of the things we’ve learned. If you’d like a refresher on the project, you can see our original “project launch” blog post here.

A few things we’ve learned so far…

We’ve found a set of local clumpy galaxies to examine more closely.

HST follow-up sample  Figure 1: A small sample of clumpy galaxies near us. These are some of the galaxies for which we’ve requested follow-up observations by the Hubble Space Telescope.

A major goal of the Clump Scout project was to find a group of local galaxies that were “clumpy”. For the time we’ve known about clumpy galaxies, they’ve mostly been considered a “high-redshift” phenomenon — which is astronomy-speak for “very far away, and very long ago”. In fact, we first discovered clumpy galaxies by examining images of the very distant universe taken by the Hubble Space Telescope. Because these galaxies were so far away, their light took billions of years to reach us, and we were seeing them as they existed when the universe was only a fraction of the age that it is now. It quickly became clear that most early-universe galaxies did not look like local galaxies, and the “spiral” or “elliptical” structure that we’re used to seeing was mostly absent. Instead, most galaxies were loosely-structured blobs of stars and gas with a few concentrated “clumps” that glowed brightly with new stars. The name “clumpy galaxy” originated to explain the appearance of these galaxies, and to differentiate them from the appearance of galaxies near us.

Unfortunately, because these galaxies are so distant, it’s difficult to study them in detail. We have wondered over the years if there are properties of clumps that are being hidden or washed-out by the dim, low-resolution photos we’ve taken from billions of light-years away. This is why the discovery of clumpy galaxies in our own backyard is such an exciting accomplishment. Thanks to the volunteer classifications from the Clump Scout project, we’ve been able to identify hundreds of galaxies with clumpy characteristics much like the much more distant versions we’re used to studying — but since they are nearby, we can perform follow-up studies with more sensitive, higher-resolution techniques. We recently submitted a proposal for observation time from the Hubble Space Telescope to examine some of these galaxies in more detail, and we’re excitedly waiting to hear back. Above, you can see ten of the galaxies for which we requested follow-up. They are dotted with blue specks, which are the “clumps” we’ve been seeking to study.

It’s much harder to see clumps in some places than others.

The Galaxy Zoo team has run many projects that study the large-scale properties of galaxies, such as their shape, characteristics, and patterns in their behavior. The Clump Scout project is a bit different because our focus is on a much smaller target. Clumps are small “substructures” within galaxies, which are much harder to see and in many cases can be entirely missed.

Part of our job during this project was to determine the properties of clumps that our volunteers could see compared to the properties of those that they couldn’t. For example, a bright clump in a dim galaxy sticks out like a sore thumb; a dim clump in a bright galaxy, on the other hand, might be completely invisible. To control this effect, we created a sample of simulated clumps with properties we already knew well, and inserted these into some galaxy images in the project. Now that so many volunteers have responded, we have a good idea of which simulated clumps can be seen and which cannot — which gives us a very good idea of what sorts of real clumps might be missing as well.

The main factor controlling whether or not a clump is visible is, of course, how bright it is. You, our volunteers, have shown us that you can catch just about all of the clumps that are above the “95% completeness limit” of the Sloan Digital Sky Survey (SDSS), the survey which provides all of Clump Scout’s images. Essentially, this means that if a clump CAN be found, you all are finding it!

Other factors controlling clump visibility were more surprising. For example, we expected that the higher an image’s resolution, the easier it would be to see clumps. In fact, resolution appeared to have almost NO effect on volunteers’ ability to see clumps: Volunteers recovered the same fraction of clumps in the clearest images as in the blurriest ones. Aside from the clump’s brightness, the most important factor in clump recovery was actually its proximity to the center of its host galaxy. We found that clumps in the dimmer, more outlying regions of galaxies are quite easy to see — they are bright spots on a dim background. However, once they are within one “effective radius” of the galactic center, they become incredibly difficult to identify. This makes sense: The galactic center is much brighter and may drown out the signal of a clump near it. This gives us a very helpful tool for understanding the patterns in clumps we are seeing. Many theories about clumps predict that they live for billions of years, beginning near the outside edges of their host galaxies and slowly migrating inward towards the center before merging with the central bulge. We now know that we are not likely to see clumps near the central bulge in our Clump Scout data, but it’s not necessarily because they’re not there: They are merely harder to see.

Recovery fractions

Figure 2: The “recovery curves” for clumps in our sample. On each plot, the height of the blue region measures the number of simulated clumps with a given property, while the orange region’s height measures the number of those simulated clumps that volunteers found and marked. The ratio between these two is called the “recovery fraction”, and it’s displayed as the black line on the plot. The recovery fraction doesn’t change much with redshift (aka distance to the galaxy) or with image resolution. However, it falls dramatically as clumps get closer to the galactic center — which tells us exactly how much harder it is to find clumps that are near the center of a galaxy.

We’re still working through our analysis of your responses, and we’ll continue to give you updates as they come. Thank you for being part of the Galaxy Zoo team!

If you’d like to try your hand at identifying a few clumps yourself, you can take part at our project page: Galaxy Zoo: Clump Scout