Archive | Site News RSS for this section

Clump Scout wrap-up: What are we doing with your 2.7 million clicks?

Hi all. My name is Nico Adams from the Galaxy Zoo science team.

Writing my first scientific paper has been equal parts exhausting and exhilarating. On Thursday, February 11, I got to put a tally in the “exhilarating” column. The paper is on the first scientific results covering the Galaxy Zoo: Clump Scout project, and I was putting the final touches on my first draft when I saw that you all had submitted the project’s final classifications. The Clump Scout project had a lofty goal — to search for large star-forming regions in over 50,000 galaxies from the Sloan Digital Sky Survey — and the fact that the Clump Scout volunteers have managed to finish it is an incredible achievement.

We’re looking forward to sharing our results over the next few months. Clump Scout is not only the first citizen science project to search giant clumps in galaxies, but it’s the first large-scale project of any kind to look for clumps in the “local” universe (out to redshift ~0.1, or within a billion-or-so light-years of us). The data set presented by this project is incredibly unique, and we are nearly finished with our first round of analysis on it.
We’re currently preparing two papers that will cover the results directly. One is focused on the algorithm that turned volunteers’ clicks into “clump locations”, while the other — my first paper — is focused on the clump catalog and scientific results we derived from it. While these papers go through a few months of revision and review, we wanted to publish a few blog posts previewing the results. This blog post will focus on the first one: We’ll explain what happened to your clicks after you sent them to us. Clump Scout could not have happened without our volunteers, and we thank you immensely for your support.

When we designed Clump Scout, we knew from the outset that we wanted classifications to be as simple as possible. The original plan was to have volunteers click on any clumps they saw, then immediately move on. While the final design was a bit more complex (a few different types of marks were available) that basic design — mark the clumps, then move on — was still present.

The classification interface after a volunteer submits their clump locations usually looks something like this:

By comparison, the “science dataset” — which consists of 20 volunteers’ classifications all laid on top of each other — looks more like this:

Just by glancing at this image, it’s clear that there are a few “hot spots” where clumps have been identified. However, correctly identifying these hot spots in every image can be EXTREMELY tricky to get right. The software that deals with this problem is called the “aggregator”, and it has to strike a balance between identifying as many clumps as possible and filtering out the isolated marks in the image.

The standard way of solving this problem in computer science is to use a “clustering algorithm”. Clustering algorithms are a very broad class of techniques used to identify clusters of points in space, and most of them are very simple to implement and run. Below, you can see the results of one clustering algorithm — called the “mean shift” algorithm — in practice.

Most clumps have been spotted correctly, and the results look good! However, it took quite a bit of fine-tuning and filtering to get the results to look like this. In the image above, the “bandwidth” parameter — the approximate “size” of each cluster — is about equal to the resolution of the image. Increasing the bandwidth can make the algorithm identify more clumps by grouping together clusters of points that are more diffuse. Unfortunately, the larger bandwidth also increases the likelihood that two or more “real” clumps will mistakenly be grouped into one. Here are the clusters we get when the bandwidth is twice as large:

Now that we’ve allowed clusters to be more spread-out, we’ve picked up on the cluster in the upper left. But, the three distinct clumps at the bottom edge of this galaxy have melded into just two, which is not what we want! This is just one of the parameters that we needed to tune. Another is the number of marks required to call a cluster a “clump”. Require too many, and you ignore valuable objects that we’re interested in. Require too few, and the algorithm picks up on objects that are really just noise.

How do we solve this problem? One thing that we tried was to have three members of the science team to classify 1,000 galaxies, so that we could see how their classifications agreed with each other and with volunteers’ marks. We found that when 2 out of 3 members of the science team identified a clump, a majority of volunteers identified it as well. This was a good sign, and it told us about how many volunteer marks to expect per clump. In general, if 60% of volunteers leave a mark within a few pixels of the same spot, we consider that spot to be a clump.

Another technique that we used was more radical. While we started out using the simple clustering algorithm we’ve described so far, we found that it was much more effective to account for who was leaving each mark. Every volunteer is an individual person, with their own clump-classifying habits. Some volunteers are very conservative and only click on a clump when they’re completely certain; others are optimists who want to make sure that no faint clumps get missed. Sometimes volunteers make genuine mistakes and believe it or not we even get a few spammers who just click all over the image! We wanted to design an aggregation system that would make best use of all volunteers’ skills and talents (and if possible even the spammers!) to help us find as many real clumps as possible, without accidentally including any other objects that can masquerade as clumps. 

To build our aggregation system, we started with an idea that was first proposed by Branson et al (2017). At its core, our system still uses a type of clustering algorithm, called a facility location algorithm. The facility location algorithm builds clusters of volunteer clicks that have a very specific connectivity pattern, which looks like this.

An example of the “facility location” algorithm. The blue “F”s mark proposed facilities, which are connected to red “C”s (cities). In practice, the facilities represent the true locations of clumps while the cities represent your marks identifying them.

Each cluster contains a central node, referred to as a “facility”, which is connected to one or more other nodes, referred to as “cities”. Facility location algorithms get their name because they are often used to minimise the cost of distributing some essential commodity like electricity or water from a small number of producers (the facilities) to a larger number of consumers (the cities). Building a facility incurs a cost and so does connecting a city to a facility. When we use the algorithm in our aggregator, the volunteer clicks that we want to group into clusters become the facilities and cities. The trick to finding the right clusters is how we choose to define the costs for facility creation and facility-city connection. 

The costs we use are based on a statistical model that tries to understand how different volunteers behave when they classify clumpy galaxies. For each volunteer, the model learns how likely that volunteer is to miss real clumps or accidentally click on other features in the subject images. The exact location of real clumps in an image can be ambiguous, so when the model thinks that a volunteer has clicked on a real clump, it also tries to predict how accurate their annotation is. But it isn’t just the volunteers that are unique – different subjects have different characteristics too, and it may be much more difficult to spot clumps in some galaxies than it is in others. For example, spotting bright, well separated clumps on a faint background is likely to be much easier than spotting faint closely packed clumps in a noisy image. Our aggregator model takes this into account as well by trying to understand just how difficult finding clumps is in different images.

How does the aggregator model work out how volunteers are behaving? Do we tell it the right answer for a handful of subjects and check the volunteers’ annotations against them? Actually no, because we don’t know exactly what the right answer is! One of the goals of Galaxy Zoo: Clump Scout was to let the volunteers decide together exactly what it takes for a feature to be a clump. So we don’t give our model any information except the clicks that the volunteers provide. Just by comparing how different volunteers respond to different images as the classifications arrive, and comparing their annotations with the clusters found by the facility location algorithm, our model slowly learns the combination of all volunteer behavioural traits and image difficulties that best explain the classification data it has seen.

Once our model provides its best description of the volunteers and images, we define the costs for the facility location algorithm. We specify that turning a volunteer’s click into a facility is more expensive for very optimistic volunteers, who might click on slightly more features that aren’t really clumps. This reduces the chance of accidentally contaminating the clump detections. Connecting clicks to an existing facility costs more if the volunteers that provided them seem optimistic. On the other hand, if it seems like a volunteer is more pessimistic or their clicks are slightly less accurate, then it becomes cheaper to connect their clicks into an existing cluster. This ensures that we don’t miss those hard-to-spot clumps with fewer clicks or more widely spread clicks.

But wait a minute! Were you reading carefully? Our model’s understanding of the volunteers and images is partly based on the clusters that were found, but the cost of creating the clusters depends on the volunteers’ behaviour! How does that work?! Good question. Whenever a new volunteer joins the project, we don’t know anything about them, so we make some reasonable assumptions about how they will behave. In a similar way, we assume that all subjects have roughly similar characteristics. We call these assumptions the “priors” of our model. These priors let us get started with a really rough set of clusters that our model can use to make an initial guess about the volunteers and subjects. Then we can use that guess to set some new costs and find some new, more refined clusters. With these clusters, our model can make another, better-informed prediction. Our algorithm keeps refining its guess and click-to-cluster assignments over and over again until the model predictions and the corresponding clusters don’t change any more. 

Compared to our simplest aggregator, our best results from our more advanced method is better at picking up faint clumps and filtering out noise. It’s also the first time this sort of method has been used in the pipeline of a major citizen science project like this one. This aggregator will be the subject of one of our upcoming papers on Clump Scout, and we are very excited to share the results.

A special thanks on this post goes out to the other members of the Clump Scout team, who helped ensure that the details of our aggregation process were as accurate and simply explained as possible. In the next week or two we’ll publish a second post detailing some of the scientific findings we’ve gotten from our results. Thank you, and stay tuned!

Happy Data Release Day: DECaLS goes live

I’m delighted to say that – with the release of the accompanying paper on the arXiv – the first data release from our Galaxy Zoo classifications of galaxies from the DECaLS survey is now live! The paper is still under review at the journal, but as lead author Mike Walmsley is handing in his thesis (congratulations!) it seemed like a good time to release the data.

As the title suggests, this data relies on classifications submitted by our wonderful Galaxy Zoo volunteers from 2015 to 2020, particularly via the ‘Enhanced’ workflow where classifications are used to educate a friendly robot assistant, speeding up the process dramatically. As a result, we have detailed classifications for 314,000 galaxies based on deeper imaging than we’ve ever had before.

The results are dramatic! In the figure above you can see a comparison between the fraction of votes a galaxy received for being ‘featured’ in our previous data release, compared to with the new DECaLS imaging. If the new imaging made no difference, the galaxies would all lie on the dotted line, but they’re mostly above it – volunteers are seeing more features in galaxies in deeper imaging. All of which makes sense, but it’s still gratifying.

We’re all looking forward to getting stuck into this dataset – and Mike has built a tool for you to explore with. Using this interface, you can sort through the data and look at the results – below is a quick sample of double rings Sandor cooked up in no time at all.

We’re not done by a long shot – unlike these systems, the galaxies currently awaiting your inspection over at have not been previously classified – with your help, hopefully it won’t be too long before we can add them to the catalogue. In the meantime – thanks for all your help!


P.S. Pulling this paper together was a real team effort so I want to thank each and every one of the team for their hard work getting this over the line. We haven’t forgotten the volunteers either – the final, published version will have an author list online with the names of everyone who contributed and we’ll email you all a link.

Press Release on Results from Galaxy Zoo: 3D

Many of you helped out with the Galaxy Zoo spinoff project, Galaxy Zoo: 3D. I am happy to let you know that I am presenting results from this project, today at the 237th Meeting of the American Astronomical Society. You can view the iPoster I made about it at this link.

This spin-off project was aimed at supporting the MaNGA (Mapping Nearby Galaxies at Apache Point Observatory) survey, which is part of the Sloan Digital Sky Surveys (SDSS). Thanks to your input we have been able to crowdsource maps which show where the spiral arms, bars and any foreground stars are present in every galaxy observed by MaNGA. This, combined with the MaNGA data is helping to reveal how these internal structures impact galaxies.

The results will be part of a Press Conference about this and other SDSS results, live streamed at 4.30pm ET (9.30pm GMT) on the AAS Press Office Youtube Channel. The press release about them will go live on the SDSS Press Page at the same time. Direct link to press release (will only work after 4.30pm ET).

Thanks again for your contributions to understanding how galaxies work.

A sad farewell

I recently received word from his wife of the death of Jean Tate on November 6. Jean had been a very active participant in several astronomical Zooniverse projects for a decade, beginning with Galaxy Zoo. It does no disservice to other participants to note that he was one of the people who could be called super-volunteers, carrying his participation in both organized programs and personal research to the level associated with professional scientists. He identified a set of supergiant spiral galaxies, in work which was, while in progress, only partially scooped by a professional team elsewhere, and was a noted participant in the Andromeda project census of star clusters in that galaxy. In Radio Galaxy Zoo, he was a major factor in the identification of galaxies with strong emission lines and likely giant ionized clouds (“RGZ Green”), and took the lead in finding and characterizing the very rare active galactic nuclei with giant double radio sources from a spiral galaxy (“SDRAGNs”). He did a third of the work collecting public input and selecting targets to be observed in the Gems of the Galaxy Zoos Hubble program. Several of us hope to make sure that as much as possible of his research results from these programs are published in full.

Jean consistently pushed the science team to do our best and most rigorous work. He taught himself to use some of the software tools normally employed by professional astronomers, and was a full colleague in some of the Galaxy Zoo research projects. His interests had been honed by over two decades of participation in online forum discussions in the Bad Astronomy Bulletin Board (later BAUT, then Cosmoquest forum), where his clarity of logic and range of knowledge were the bane of posters defending poorly conceived ideas.

Perhaps as a result of previous experiences as a forum moderator, Jean was unusually dedicated to as much privacy as one can preserve while being active in online fora and projects (to the point that many colleagues were unaware of his gender until now). This led to subterfuges such as being listed in NASA proposals as part of the Oxford astronomy department, on the theory that it was the nominal home of Galaxy Zoo. Jean was married for 27 years, and had family scattered in both hemispheres with whom he enjoyed fairly recent visits. Mentions in email over the years had made me aware that he had a protracted struggle with cancer, to the extent that someday his case may be eventually identifiable in medical research. He tracked his mental processes, knowing how to time research tasks in the chemotherapy cycle to use his best days for various kinds of thinking.

This last month, emails had gone unanswered long enough that some of us were beginning to worry, and the worst was eventually confirmed. I felt this again two days ago, which was the first time I did not forward notice of an upcoming Zoo Gems observation by Hubble to Jean to be sure our records matched.

Ad astra, Jean.

Radio Galaxy Zoo: LOFAR – A short update

A lot has happened on the Radio Galaxy Zoo since we last posted an update!

First of all, you can see on the image above that we are making great progress with getting all of the big, bright sources from the LOFAR survey looked at by Zooniverse volunteers. We are approaching half a million classifications and just under 80,000 radio sources have been looked at by at least five volunteers at the time of writing. Together with the earlier efforts by members of the LOFAR team, we have covered a very wide area of the sky, around 3,000 square degrees, which is well over half of the area of the LOFAR data, and are well on the way to completing the original aims of the project. The green, orange and pink areas together show the areas of the sky we have completed.

What’s next? One of the key goals of the LOFAR Radio Galaxy Zoo has always been to provide targets for the WEAVE-LOFAR spectroscopic survey. WEAVE is a new spectroscope being commissioned on the William Herschel Telescope, which can measure 1,000 redshifts of galaxies in a single observation. WEAVE-LOFAR aims to find the redshifts of every bright LOFAR source in the survey. But the survey can’t work without knowing where the optical host galaxies are — so the input of Zooniverse volunteers in selecting these host galaxies is absolutely crucial to our success.

A complication is that WEAVE wants to look at all LOFAR sources, not just the large ones that we generally select for the Zooniverse project. As regular users will know, there are many small sources in the radio sky as well, and the optical counterparts of those can be found automatically just by matching with optical catalogues. In between there are some intermediate-sized sources, and these present the biggest problem; some of them benefit from viewing by volunteers, but there are too many of them for us to look at them all. Earlier in the year we selected 10,000 of these in a particular region of the sky that we thought would benefit from human inspection using a combination of algorithms and machine learning, and injected them into the Zooniverse project to see what volunteers made of them. The results are encouraging and have allowed us to develop a process of ‘early retirement’ for sources that turn out not to be interesting (i.e. no clicks are made during classification). Our next priority is to select this type of source, informed by the first set of results, over a larger area of the sky in order to get the full set of inputs for the first year of WEAVE. You’ll see these sources entering the Zooniverse project over the coming weeks.

Galaxy Zoo: Clump Scout – a first look at the results

Hi all! Nico here, grad student from the Minnesota science team, with an update on the Galaxy Zoo: Clump Scout project.

Since launching Clump Scout in September of last year, we’ve had over 7,000 volunteers provide more than 800,000 classifications! We’re incredibly grateful for your help and we’ve been excitedly exploring the data as it has been coming in to learn more about clumpy galaxies in the local universe.

Now that we’re around the halfway point with this project, we wanted to share with you some of the things we’ve learned. If you’d like a refresher on the project, you can see our original “project launch” blog post here.

A few things we’ve learned so far…

We’ve found a set of local clumpy galaxies to examine more closely.

HST follow-up sample  Figure 1: A small sample of clumpy galaxies near us. These are some of the galaxies for which we’ve requested follow-up observations by the Hubble Space Telescope.

A major goal of the Clump Scout project was to find a group of local galaxies that were “clumpy”. For the time we’ve known about clumpy galaxies, they’ve mostly been considered a “high-redshift” phenomenon — which is astronomy-speak for “very far away, and very long ago”. In fact, we first discovered clumpy galaxies by examining images of the very distant universe taken by the Hubble Space Telescope. Because these galaxies were so far away, their light took billions of years to reach us, and we were seeing them as they existed when the universe was only a fraction of the age that it is now. It quickly became clear that most early-universe galaxies did not look like local galaxies, and the “spiral” or “elliptical” structure that we’re used to seeing was mostly absent. Instead, most galaxies were loosely-structured blobs of stars and gas with a few concentrated “clumps” that glowed brightly with new stars. The name “clumpy galaxy” originated to explain the appearance of these galaxies, and to differentiate them from the appearance of galaxies near us.

Unfortunately, because these galaxies are so distant, it’s difficult to study them in detail. We have wondered over the years if there are properties of clumps that are being hidden or washed-out by the dim, low-resolution photos we’ve taken from billions of light-years away. This is why the discovery of clumpy galaxies in our own backyard is such an exciting accomplishment. Thanks to the volunteer classifications from the Clump Scout project, we’ve been able to identify hundreds of galaxies with clumpy characteristics much like the much more distant versions we’re used to studying — but since they are nearby, we can perform follow-up studies with more sensitive, higher-resolution techniques. We recently submitted a proposal for observation time from the Hubble Space Telescope to examine some of these galaxies in more detail, and we’re excitedly waiting to hear back. Above, you can see ten of the galaxies for which we requested follow-up. They are dotted with blue specks, which are the “clumps” we’ve been seeking to study.

It’s much harder to see clumps in some places than others.

The Galaxy Zoo team has run many projects that study the large-scale properties of galaxies, such as their shape, characteristics, and patterns in their behavior. The Clump Scout project is a bit different because our focus is on a much smaller target. Clumps are small “substructures” within galaxies, which are much harder to see and in many cases can be entirely missed.

Part of our job during this project was to determine the properties of clumps that our volunteers could see compared to the properties of those that they couldn’t. For example, a bright clump in a dim galaxy sticks out like a sore thumb; a dim clump in a bright galaxy, on the other hand, might be completely invisible. To control this effect, we created a sample of simulated clumps with properties we already knew well, and inserted these into some galaxy images in the project. Now that so many volunteers have responded, we have a good idea of which simulated clumps can be seen and which cannot — which gives us a very good idea of what sorts of real clumps might be missing as well.

The main factor controlling whether or not a clump is visible is, of course, how bright it is. You, our volunteers, have shown us that you can catch just about all of the clumps that are above the “95% completeness limit” of the Sloan Digital Sky Survey (SDSS), the survey which provides all of Clump Scout’s images. Essentially, this means that if a clump CAN be found, you all are finding it!

Other factors controlling clump visibility were more surprising. For example, we expected that the higher an image’s resolution, the easier it would be to see clumps. In fact, resolution appeared to have almost NO effect on volunteers’ ability to see clumps: Volunteers recovered the same fraction of clumps in the clearest images as in the blurriest ones. Aside from the clump’s brightness, the most important factor in clump recovery was actually its proximity to the center of its host galaxy. We found that clumps in the dimmer, more outlying regions of galaxies are quite easy to see — they are bright spots on a dim background. However, once they are within one “effective radius” of the galactic center, they become incredibly difficult to identify. This makes sense: The galactic center is much brighter and may drown out the signal of a clump near it. This gives us a very helpful tool for understanding the patterns in clumps we are seeing. Many theories about clumps predict that they live for billions of years, beginning near the outside edges of their host galaxies and slowly migrating inward towards the center before merging with the central bulge. We now know that we are not likely to see clumps near the central bulge in our Clump Scout data, but it’s not necessarily because they’re not there: They are merely harder to see.

Recovery fractions

Figure 2: The “recovery curves” for clumps in our sample. On each plot, the height of the blue region measures the number of simulated clumps with a given property, while the orange region’s height measures the number of those simulated clumps that volunteers found and marked. The ratio between these two is called the “recovery fraction”, and it’s displayed as the black line on the plot. The recovery fraction doesn’t change much with redshift (aka distance to the galaxy) or with image resolution. However, it falls dramatically as clumps get closer to the galactic center — which tells us exactly how much harder it is to find clumps that are near the center of a galaxy.

We’re still working through our analysis of your responses, and we’ll continue to give you updates as they come. Thank you for being part of the Galaxy Zoo team!

If you’d like to try your hand at identifying a few clumps yourself, you can take part at our project page: Galaxy Zoo: Clump Scout 




Radio Galaxy Zoo: LOFAR – The First Classification Results

Presenting some results from the first two weeks of the Radio Galaxy Zoo: LOFAR project.

Hi everyone! We are extremely excited to see how popular the Radio Galaxy Zoo: LOFAR project is. Since we launched two weeks ago we’ve already had over 234,000 classifications! In this brief blogpost we’d like to give an overview about the classification statistics and how the project is coming along. The in-depth scientific results will follow later, after more careful analysis.

Some General Statistics


In the graph above, you can see the number of classifications per hour. The graph starts at the launch date of the project (25/02) as you can see from the smaller first peak, we started at about 500 classifications an hour. Things really start taking off rapidly around the morning of the next day, as the European press released their articles about the Radio Galaxy Zoo: LOFAR, peaking at 3000 classifications an hour! Afterwards, we see the generally expected day and night trends, following European time, thus indicating most volunteers are European.


The figure above shows the number of classifications grouped per language setting. As is very clear, English is the dominant one, almost 80% of the classifications are made through the English version of the website. However, French is also a pretty popular langauge setting. This is just a proxy for the distribution of the countries however. It is a bit of a skewed view since there are probably many users that prefer to view the website in English, even when that is not their native language. This is the most likely explanation for the low number of classifications using Dutch language settings. You would expect a lot of classifications with the Dutch settings as the LOFAR telescope itself is located primarily in the Netherlands and therefore has gotten more attention from the Dutch press.


We’d also like to show the distribution of the number of classifications per user. When we zoom out (the right figure) we can see that there are a few users competing hard for the most classifications, right now there is a clear number one at more than 6,000 classifications already, which is amazing.

Interesting Sources

As of the time of writing this blogpost, we have already found a ton of interesting sources. Many very nice examples of classical double lobes, but also many complex cases and beautiful starforming galaxies have been identified already. As the project has just started, we have not had time to analyze the sources in detail yet, but stay tuned for updates on this!


Common Pitfalls

An interesting thing we noticed was that many people found ‘explosions’ in the Radio Galaxy Zoo: LOFAR, like the one in the image below. Unfortunately, these radial spokes are not an explosion but just imaging artefacts from where our calibration fails, which usually happens around very bright sources. If you see something like this, please click on “Artefact” at the final “Additional information” task.


Additionally, (real) diffuse emission is often mistaken for artefacts, but emission that is not mapping any compact structure is not necessarily an artefact, like the image below:


On the other hand, the small islands of emission in the image below are actually artefacts. Watch out!


Finally, some double lobes were also identified as blends, but this is likely just by volunteers that are still getting the hang of it. See the image below for two example cases that were incorrectly identified as a blend (three and four out of five times respectively). However, these are both just classic double-lobed radio galaxies.


Of course, this is just nitpicking on the cases that are going wrong, but most of the cases seem to be going well. Many real blended sources have also been identified, such as the examples below! So the option “Blend” should be picked when two distinct radio sources are under the solid ellipse.


How much sky have you covered so far?

The progress of the first two weeks of the Radio Galaxy Zoo: LOFAR has been amazing. In terms of sky area, the citizen scientists have already seen a quite a big chunk of the northern sky that we want to cover. The image below shows the sky area that we are currently investigating in light blue (called DR2, for data release 2). The purple and orange dots show the fields that the LOFAR team has done internally for the first data release (DR1) and for a small part of the second data release, respectively. The green dots show the fields that the public Radio Galaxy Zoo: LOFAR project has completed thus far.

You can see that in just two weeks we’ve already more than doubled the amount of area that took months for scientists to look at during the first data release of the LOFAR survey! If we keep up the current pace, we will be finished in no time.rgz_fig.png

As the project continues, we plan to give you more updates on the data reduction process, so stay tuned!

Strong and weak bars in Galaxy Zoo

Good morning everyone,

My name is Tobias and I’m a new PhD student here at Oxford. I use the classifications everyone made in Galaxy Zoo to attempt to understand how galaxies evolve. Right now, I’m especially interested how bars affect galaxy evolution.

As some of you know, Galaxy Zoo currently asks to differentiate between so-called ‘strong’ or ‘weak’ bars. Below you can find some neat examples of both classes of galaxies that were identified using your classifications. It seems that the difference between strong and weak bars is some sort of combination between the length, width and brightness of the bar. 

Examples of strongly barred (top row) and weakly barred (bottom row) galaxies.

The relationship between bars and galaxy evolution has been studied before by members of the Galaxy Zoo team, but the previous incarnation of Galaxy Zoo only allowed binary answers to the bar question: either there was a bar or not. The interesting bit, however, is to see whether strong and weak bars have different effects.

In fact, we have exciting preliminary data that suggests both types do behave differently in the context of galaxy evolution! When a galaxy evolves and moves from the ‘blue cloud’ to the ‘red sequence’ in the colour-magnitude diagram, its morphology and properties change (e.g.: its star formation rate decreases). This process is called ‘galaxy quenching’. With the new Galaxy Zoo data and the classifications that everyone involved made, we saw that galaxies with weak bars are found in both the blue cloud and the red sequence, whereas the strongly barred galaxies are very much clustered in the red sequence, as you can see below. In more detail, strongly barred galaxies only make up ~5% of the blue cloud, while making up ~16% of the red sequence. To contrast this, weakly barred galaxies have a much more modest increase, populating ~17% and ~21% of the blue cloud and red sequence, respectively.

Contour plot of the colour-magnitude diagram for all the galaxies in Galaxy Zoo. Overlaid on top are the strongly barred galaxies (in green) and the weakly barred galaxies (in orange). The dotted line (taken from Masters et al. (2010)) defines ‘the blue edge of the red sequence’ and effectively divides the sample in two populations: the blue cloud and red sequence. One can clearly see that the strong bars are mainly above the dotted line.

This finding hints at a fundamental difference between the two types of bars, but in order to do real science we need to interpret the clustering of the strong bars correctly. Do strong bars cause the galaxy to quench and move up the red sequence or can a strong bar only form if the galaxy is already sufficiently quenched – a chicken or egg question on the scale of galaxies.

Before I end this post, I want to emphasise that this research is only made possible because of many volunteers, like yourself, that help classify galaxies and we are very grateful for your time and effort. However, this is only the start and a lot of work still needs to be done, so keep on classifying!

I hope to report on interesting new developments soon.


Galaxy Zoo Human + AI Paper Published

Hi all, Mike here.

A few months back, I introduced our new AI that can work together with volunteers to classify galaxies. It’s able to understand which galaxies, if classified by you, would best help it to learn. You and the AI have together classified tens of thousands of galaxies since we launched the new system in May.

I’m really happy to say that our paper was recently accepted for publication in the Monthly Notices of the Royal Astronomical Society!

We’ve made a few changes since the early version I shared before. I think the most interesting change is a new section applying AI fairness tools. These tools are usually used to check if AI models make biased decisions – for example, offering less jobs to women. We used these tools to check if our model is biased against galaxies with certain physical properties (it isn’t).

You can read the latest pre-print of the paper for free here. The (essentially identical) final publication will be also available for free from Monthly Notices once published – we’ll update this post when that happens.

Happy classifying,


The clumpiness of EAGLE galaxies

We have added new galaxies from the EAGLE simulations for you to classify on To find out more about what to do if some of them appear clumpy read this blog post.

It’s important to note that while EAGLE produces some impressive galaxy images, there are still some ways in which they don’t quite resemble real galaxies. A prominent example of this is in how many star-formation “clumps” there are in galaxies. Stars form in clumps or clusters of varying size, and some observed galaxies are clumpy in appearance, so the models are reproducing a real phenomenon. It also seems that these galaxies are more common in the early Universe, and are an important part of galaxy evolution. However, the clumpy galaxies may be too common within EAGLE.


Some EAGLE galaxies that appear clumpy. Clumps appear bright blue, because they have formed recently and contain the hottest and brightest (but shortest-lived) stars. From left to right, you can see clumpy galaxies that may appear disk-like, rounded or more chaotic in shape.

We have an understanding of why this happens: clumps can result from the limited detail with which galaxies can be modelled (even in the most powerful supercomputers), and the simplifications that need to be made to how gas interacts. This doesn’t affect other things we can learn from classifying these images. If you come across a galaxy that looks super-clumpy like the above images, the best thing to do is just ignore the clumpiness and classify the rest of the galaxy (If you would like to learn more about clumps, read about our sister project Galaxy Zoo: Clump Scout).