Archive | Site News RSS for this section

The dawn of Galaxy Zoo’s new incarnation – Galaxy Zoo: Cosmic Dawn!

This week, Galaxy Zoo begins its latest incarnation, Galaxy Zoo: Cosmic Dawn, with tens of thousands of new galaxy images now available for you to help classify! These were taken by the Hyper Suprime-Cam (HSC) on board the 8.2m Subaru telescope on the summit of Mauna Kea in Hawaii, as part of the Hawaii Two-0 (H20) survey, a key component to the more ambitious Cosmic Dawn survey.

The Cosmic Dawn Survey is a multi-wavelength survey aiming to understand the co-evolution of galaxies, the dark matter haloes that host them, and their central black holes over cosmic time, all the way from when galaxies first formed in the early Universe. A major part of this is the H20 survey which has obtained ultra-deep Subaru HSC imaging over large and particularly dark areas of the sky. The H20 survey targets two areas of the sky which, as part of the Cosmic Dawn Survey, were observed by the Spitzer Space Telescope in the largest allocation of observing time ever awarded on the spacecraft. When combined with these infrared data, H20 aims to push the boundaries of extragalactic astronomy by studying galaxy evolution out to around 800 million years after the Big Bang. This incarnation of Galaxy Zoo features images from a portion of the sky called the Euclid Deep Field North (EDF-N), from one of the two areas targeted by the H20 survey.

This effort from Galaxy Zoo will therefore also help prepare for the upcoming launch of the Euclid space telescope by the European Space Agency (ESA) in 2023, with the classifications you make now helping to guide what Euclid will observe in more detail with its even higher resolution imaging in visible light and the infrared.

Compared to previous incarnations of Galaxy Zoo such as those using SDSS and DeCALS images, H20 enables us to see fainter and more distant galaxies from earlier in the Universe’s history, thanks to higher angular resolution of HSC and greater depth of the survey. However, deeper imaging also means we can observe many more distant galaxies in the same patch of sky, so the images you will see may often appear redder and blurrier than you might expect.

The Galaxy Zoo decision tree of questions has been modified so that your classifications can help refine the software used by the H20 team, and perhaps that of future Galaxy Zoo incarnations as well. For example, the “Star or Artifact” question now includes a “Bad Image Zoom” option, while selecting “Non-star Artifact” will allow you to classify the type of any image artifact you come across, such as satellite trails. In the future, Galaxy Zoo will also be running a simplified decision tree of questions for some of the fainter distant galaxies, as their lower resolution prevents many features from being identified.

Also, look out for any galaxies containing bright clumps! We’ve added a question about these, as they can help us understand the period of intense star formation that took place in the early Universe.

Finally, we are also asking volunteers to tag any extremely red objects, or those with a lens or arc features, in the Talk board, using the “Done & Talk” option. These are rare objects that we don’t want to miss, especially for such distant galaxies!

We are excited about the new images and looking forward to seeing what you’ll discover. Join the classification now!

James Pearson, Galaxy Zoo and H2O teams

Our last night at the telescope

The story so far: on the first night we were able to observe until 02:00 before the weather forced us to close. The following three nights we were confined to the Residencia, the place where they keep all the astronomers when they are not observing. Much to our surprise, this morning we awoke to a sky only sparsely covered by clouds, instead of being in the middle of one. Maybe we have a chance of observing something tonight?

== 19:30 ==

As the weather has cleared somewhat on Roque de Los Muchachos, La Palma, we have received permission to go to the telescope tonight. We’ve just arrived and started taking the first couple of calibration images with much enthusiasm. 

A bit about the telescope itself: we’re using a telescope called the Isaac Newton Telescope (INT). It is a Cassegrain reflector telescope with a ~2.5m primary mirror that weighs ~4000kg! 

We’re not great at selfies

It takes a while for the telescope to take all the calibration images (biases, arcs and flats), so we were able to enjoy the sunset right before -hopefully- a very busy night. 

== 22:00 ==

Unfortunately the weather has taken a turn for the worse. We cannot open the dome of the telescope as the humidity is too high. We’ve had our first cups of coffee and are settling in for the night, while keeping an eye on the humidity sensor.

As tonight will be our last night at the telescope and we’ve had bad luck with the weather the last couple of days, we are very hopeful to observe some galaxies tonight. So far we were able to observe only one galaxy before we had to close on the first night. At this point we will be grateful for any data that comes in, even one more galaxy would double our current sample size!

== 00:45 ==

We’re still not able to open, humidity is at 100% and we cannot see any stars. The highlight of the last couple of hours was exploring the library in the INT and listening to an old cassette tape of Joseph and the technicolour dream coat. None of the other tapes work.

== 04:00 ==

We’ve had a popcorn and pizza break, drank multiple cups of coffee and explored most of the telescope building. However, unfortunately we’ve not been able to use the telescope today – the weather gods seem determined to prevent us from getting any data. With any luck we’ll be back next year to try again.

Observing bars in La Palma

As some might remember, in our last paper (which can be found here), we studied differences between weak and strong bars. One of our results was that star forming galaxies with stronger bars have significantly higher star formation in their centres compared to galaxies with weaker bars. This might be due to differences in the gas flows induced by the different types of bars. To investigate this, we selected a sample of 21 galaxies from Galaxy Zoo, which we plan to observe over the next couple days. 

The relationship between galactic bars and star formation has long been up for debate. Galactic bars are vast structures of co-orbiting gas, dust and stars that form directly across the galactic nucleus. It is thought that gas flows along the arms of the bar into the centre, increasing the central gas density. As gas density increases, the star formation rate would also. So, that should be the answer then? However, in reality, it is not so simple.

There are many different kinds of bars, with varying characteristics such as strengths and orientations. A galaxy might contain a very strong bar – where it clearly dominates even over the galactic disk – or it could have a very weak bar – where the disk dominates over it. So, we need to ask ourselves further, is the gas flow and resultant star formation higher in galaxies with strong bars? What about weak bars? Does it even change at all if we compare either type of bar to galaxies which have no bars? 

These are the questions we are going to answer at the Isaac Newton Telescope, or INT, on the island of Santa Cruz de la Palma. With a sample of 21 galaxies characterised by Galaxy Zoo – 7 strongly barred, 7 weakly barred and 7 with no bars at all – we are investigating if any relation between star formation rate and bar strength exists. An example of each type of galaxy in our sample is shown below. On the left is a galaxy with no bar at all, while on the right is a galaxy with a strong bar. The strong bar clearly dominates over the disk of the galaxy. The middle panel shows a galaxy with a much weaker bar, where the disk dominates over the disk galaxy.

To investigate this, we must turn to spectroscopy. Rather than utilise images, such as the ones above, we align a spectroscopic slit along and perpendicular to the bar direction on the image. The spectroscope will split the incoming light into a spectrum of wavelengths, where we will be able to find any spectral signatures of elements within the bars themselves. 

There are two chemical signatures of star formation that we are looking for. The first, an indirect measurement, is looking for Hydrogen Alpha, or Hα. If there is a much higher abundance of Hα at the core of a galaxy with a strong bar, weak bar or not, it is very likely that there is a higher gas density. Ergo, there is a higher star formation. The second signature we are looking for is Oxygen III, or O[III]. O[III] is highly ionised only typically exists in areas where there are high rates of the star formation; the newly born stars being the cause of the ionisation. This would be direct evidence of higher star formation.

So, what do we find? Thus far, due to the adverse weather conditions caused by tropical storm Hermine on La Palma, we have set our spectroscopic observations on a single strongly barred galaxy. We have extracted the spectrum, removed any sources of contamination and reduced to only that of the bar and galactic nucleus. The top image is the spectrum from the slit at ninety degrees perpendicular to the bar direction and the bottom is aligned along it.

The top spectrum (perpendicular to the bar) appears to be almost empty, with only noise present. Along the bar, however, we get a very strong emission line at precisely 6562.801Å. Guess which wavelength Hα happens to rest at? Precisely the same! 

This is certainly a promising initial result. If the abundance of Hα is much higher along bars than not, then this is certainly a case for them enhancing star formation! The next steps are to confirm this finding by taking observations for the rest of our sample. Once the weather clears up on La Palma, we will be aiming to finally answer the question, what do galactic bars do for star formation? Enhance, prevent or nothing? Well, it looks like enhancing has won the first point!

We will keep you updated!

David, Tobias and Chris

In the News – Ringed Galaxies from GZ Mobile

Dear volunteers,

Thanks to you, we’ve found 40,000 new ringed galaxies – about six times more than all the ringed galaxies anyone has ever found before! The Royal Astronomical Society were impressed enough to share the news in a press release here.

Galaxy rings found by GZ Mobile volunteers (that’s you!)

I launched the Rings Challenge here on this blog ten months ago, asking for your help searching for galaxies with rings around them. I wasn’t sure if anyone would be interested in using the new mobile project we made. Ten months later, you’ve made a million swipes on 100,000 galaxies. I’m so grateful.

Rings are rare. To help you find them, I created an automatic assistant. I used the first half of your swipes to teach an artificial intelligence algorithm what rings look like. Then I set the algorithm searching a million DESI galaxies to find more rings. Finally, I took the galaxies the AI thought might have rings and asked you to check them with the second half of your swipes. This two step approach let us both search many galaxies quickly and have human eyes vet all of our discoveries.

This is the first major science result from the new GZ Mobile project. Making an app wasn’t part of the original plan – the first iPhone launched three weeks after GZ, 15 years ago this month – but it’s now a crucial tool for hunting specific galaxies quickly. I hope you’ll join us for the next search.




You can find more technical details on the machine learning on my personal blog.


Apologies to ChristineM, who many months ago correctly point out that I should technically call them “ringed” galaxies rather than “ring” galaxies.

Happy Birthday (& a belated announcement)

Firstly, happy 15th birthday to Galaxy Zoo and thank you to all those who have made it a success over the last decade-and-a-half. Whether you’re a regular on Talk, and original classifier, a member of the science team or someone who has been inspired by the project to find out a little more about science, thank you!

BBC News article announcing the launch of the project on 11th June 2007

Looking back at the BBC article that started everything, I notice we ‘hoped’ that 30000 people would take part in our project. We blew past that target early on, and haven’t looked back since. The results of the project have told us much about galaxy history, inspired novel machine learning approaches to science, helped us build the broader Zooniverse and much more. It still stuns me to think that there’s a Hubble Space Telescope program following up on Galaxy Zoo discoveries, and there is still much more science to come. (It’s not ridiculous to think that JWST, the new space telescope which will release its first image tonight, will soon follow up on a Galaxy Zoo discovery, or provide images for a future version of the project). My failure to anticipate this glorious feature came because we simply underestimated the passion, ability and enthusiasm of all of you to help learn a little bit more about the Universe.

I’m also very proud of the PhD students who have worked with the Galaxy Zoo team to make use of our data, and help lead the project. Several generations have now successfully graduated, and there’s much more to come from the current cohort. Since September, the project has been led by the immensely impressive Karen Masters, who has taken over from me as Principal Investigator, with assistance from Brooke Simmons (Deputy PI), Sandor Kruk (Project Scientist), Becky Smethurst (Deputy Project Scientist), and Mike Walmsley (Technical Lead). With them the project is in excellent hands, and I’m looking forward to the next decade-or-so of galaxy science, powered by you, the wonderful denizens of the Zoo.

Announcing: Jan 13 press conference on Galaxy Zoo: Clump Scout results

I’m Nico, a PhD student with the Galaxy Zoo team, and I have an exciting announcement. About a year ago I wrote that classifications on the Galaxy Zoo: Clump Scout project had just finished. Now, with the first results nearing publication, the American Astronomical Society (AAS) has chosen Clump Scout to present its findings at an official press conference on Thursday, January 13 from 4:15-5:15pm Eastern Time (or 9:15-10:15pm GMT for our UK visitors). We’re very excited to finally share these results with our volunteers!

The press conference is free and open to all, so if you took part in the project, we encourage you to tune in to learn more about where your efforts have gone. (Or, if you’ve never heard of the Clump Scout project before, now is a great chance to learn!) I’ll spend a few minutes explaining why we created the project, and describe a few clues we’ve found as to the last 10 billion years of galaxy evolution. There will also be 4 other speakers presenting about their own citizen science work, so it will be a thorough tour of what’s going on in people-powered astronomy today.

We hope you can join us!

How to join:

You watch via YouTube live stream on AAS’s YouTube channel:

PS. For more galaxies at the AAS (although not Galaxy Zoo directly), also see our PI Karen Masters talking about the completion of the MaNGA Galaxy Survey, Tue 11th Jan in the 2.15pm ET Press Conference. MaNGA the survey Galaxy Zoo: 3D was designed to help analyse; and look out for more crowd-sourcing projects to come from this complex data now it’s all publicly available, as well as much more use of the Galaxy Zoo: 3D classifications.

New Paper – Practical Galaxy Morphology Tools

Last year, we published the GZ DECaLS catalog: detailed morphology classifications for 314,000 galaxies. We classified so many galaxies by training AI models to learn from volunteers and work alongside them. This raises the question – what else can we do with those models?

It turns out that we can use them to make three new practical tools that will help both professional researchers and volunteers. You can read all about them in our new paper out today:

The first practical tool is a similarity search. You can type in the coordinates of a galaxy, and it will try to show you the most similar galaxies. Try it out on your favourite DECaLS galaxy. For now, it’s a simple demo website, but we hope to eventually integrate this into Galaxy Zoo.

The second is a new method for finding the galaxies most interesting to you personally. Imagine a website where you can rate galaxies by how interesting you find them. As you rate galaxies, the website shows you new ones for you based on your previous ratings – just like how Netflix suggests new series (I’m a big Bojack fan myself). The system is too complicated to create a simple demo to show you, but you can see some examples in the new paper. Thanks to funding from the Sloan Foundation, we’re making this even better and adding it as an official Zooniverse feature.

The third is about adapting the AI models to classify new kinds of galaxies. If a researcher wants a model that can find ringed galaxies, for example, they would usually have to start by gathering tens of thousands of examples of ringed galaxies with which to teach their new model. This takes a long time and a lot of effort, especially for rarer galaxies. However, a model already trained on Galaxy Zoo classifications needs just hundreds of example galaxies to learn to find rings as well. This will let researchers “fine-tune” models to help solve their own specific science problems. That includes me! I’m running a Galaxy Zoo Mobile project to make a new ring catalogue with this approach.

All these tools work because of your classifications. As well as using them directly in science catalogues, we need them to train better AI models. Thank you for your contribution.

If you have any spare time – maybe on the bus, or just sitting around scrolling – I would really appreciate your help finding ring galaxies by swiping left and right on Galaxy Zoo Mobile, part of our Zooniverse app (Apple, Android). I’m hoping to build the biggest catalogue of rings ever assembled so we can understand how they form. Please join in if you can.



P.S. You can find a few more technical details on my personal blog.

New Galaxy Zoo Mobile challenge – Ringed Galaxies

My name is Mike – I’m a researcher helping run the Zooniverse project Galaxy Zoo

I’m launching a new challenge within Galaxy Zoo Mobile, the version of GZ that runs on our mobile app (iOS, Android, scroll down to “Space” projects).

The challenge is to find galaxies with rings. I’ve picked out the 25,000 galaxies where some* volunteers voted for “Ring” on the final GZ question – “Does this galaxy have any rare features?”. Now it’s time to do a targeted search through these promising galaxies. Swipe left and right on GZ Mobile to tell us which ones you think have rings.

This is what galaxies with rings look like. I think these are easily the most beautiful galaxies we’ve ever shown on Galaxy Zoo, with glittering spiral arms and intricate structures. We’ve zoomed in each picture about 25% more than in Galaxy Zoo itself, so you’ll see all that fine detail.

We want to find galaxies with rings because they’re a mystery. Astronomers aren’t sure what causes rings. 

One leading theory is that they form from disk galaxies left undisturbed for hundreds of millions of years. Theoretical calculations and computer simulations suggest that the gravity of stars in the galaxy’s bar or bulge can cause the orbits of nearby stars to change, first making spiral arms and eventually a ring shape. Another theory is that rings are caused by head-on collisions where a small galaxy punches through the middle of a large disk galaxy, like a rock dropped into a pond.

The truth is that there are probably different kinds of ring, formed by different processes. Working out which processes form which rings will require many examples of each – and that’s where you come in. 

This targeted project is all about finding as many rings as possible. Once we know which galaxies have rings, we can follow up with future projects to divide them into different categories, and compare those categories to find out what creates each type of ring. 

As always with Galaxy Zoo, your classifications will be publicly shared with all researchers to help everyone investigate rings. We will also use your classifications to teach a new version of Zoobot, our galaxy-classifying AI, to find rings. Zoobot can then help find more rings in the million-or-so galaxies recently released by the DECaLS survey** that we haven’t yet uploaded to Galaxy Zoo. 

If you have any questions, come chat to our community and myself on the Galaxy Zoo Talk forum



* Specifically, galaxies where the fraction of volunteers answering “ring” is in the top third (typically about two or more volunteers).

** The published catalog from Galaxy Zoo DECaLS used images from Dark Energy Camera Legacy Survey data release 5 and earlier. The survey has since released more galaxy images, some of which have already been uploaded to Galaxy Zoo.

Stronger bars help shut down star formation

Hi everyone!

I’m Tobias Géron, a PhD student at Oxford. I have been using the classifications of the Galaxy Zoo DECaLS (GZD) project to study differences between weak and strong bars in the context of galaxy evolution. We have made significant amount of progress and I was able to present some results a couple of weeks ago at a (virtual) conference in the form of a poster, which I would love to share with you here as well.

To summarise: I have been using the classifications from GZD to identify many weakly and strongly barred galaxies. Some example galaxies can be found in the first figure on the poster. As the name already implies, strong bars tend to be longer and more obvious than weak bars. But what exactly does this mean for the galaxy in which they appear?

One of the major properties of a galaxies is whether it is still forming stars. Interestingly, in Figure 2 we observe that strong bars appear much more frequently in galaxies that are not forming stars (called “quiescent galaxies”). This is not observed for the weak bars. This suggests one of two things: either the strong bar helps to shut down star formation in galaxies or it is easier to form a strong bar in a quiescent galaxy.

In an attempt to answer this chicken or egg problem, we turn to Figure 3. Here, we show that the rate of star formation in the centre of the galaxy is highest for the strongly barred galaxies that are still star forming. This suggests that those galaxies will empty their gas reservoir quicker, which is needed to make stars, and are on a fast-track to quiescence. 

I’m also incredibly happy to say that we’ve written a paper on this as well, which has recently been accepted for publication! You can currently find it here. Apart from the results described above, we also delve more deeply into whether weak and strong bars are fundamentally different physical phenomena. Feel free to check it out if you’re interested!

It’s amazing too see all this coming to fruition, but it couldn’t have been possible without the amazing efforts of our citizen scientists, so I want to thank every single volunteer for all their time and dedication. We have mentioned this in the paper too, but your efforts are individually acknowledged here. Thank you!



Clump Scout wrap-up: What are we doing with your 2.7 million clicks?

Hi all. My name is Nico Adams from the Galaxy Zoo science team.

Writing my first scientific paper has been equal parts exhausting and exhilarating. On Thursday, February 11, I got to put a tally in the “exhilarating” column. The paper is on the first scientific results covering the Galaxy Zoo: Clump Scout project, and I was putting the final touches on my first draft when I saw that you all had submitted the project’s final classifications. The Clump Scout project had a lofty goal — to search for large star-forming regions in over 50,000 galaxies from the Sloan Digital Sky Survey — and the fact that the Clump Scout volunteers have managed to finish it is an incredible achievement.

We’re looking forward to sharing our results over the next few months. Clump Scout is not only the first citizen science project to search giant clumps in galaxies, but it’s the first large-scale project of any kind to look for clumps in the “local” universe (out to redshift ~0.1, or within a billion-or-so light-years of us). The data set presented by this project is incredibly unique, and we are nearly finished with our first round of analysis on it.
We’re currently preparing two papers that will cover the results directly. One is focused on the algorithm that turned volunteers’ clicks into “clump locations”, while the other — my first paper — is focused on the clump catalog and scientific results we derived from it. While these papers go through a few months of revision and review, we wanted to publish a few blog posts previewing the results. This blog post will focus on the first one: We’ll explain what happened to your clicks after you sent them to us. Clump Scout could not have happened without our volunteers, and we thank you immensely for your support.

When we designed Clump Scout, we knew from the outset that we wanted classifications to be as simple as possible. The original plan was to have volunteers click on any clumps they saw, then immediately move on. While the final design was a bit more complex (a few different types of marks were available) that basic design — mark the clumps, then move on — was still present.

The classification interface after a volunteer submits their clump locations usually looks something like this:

By comparison, the “science dataset” — which consists of 20 volunteers’ classifications all laid on top of each other — looks more like this:

Just by glancing at this image, it’s clear that there are a few “hot spots” where clumps have been identified. However, correctly identifying these hot spots in every image can be EXTREMELY tricky to get right. The software that deals with this problem is called the “aggregator”, and it has to strike a balance between identifying as many clumps as possible and filtering out the isolated marks in the image.

The standard way of solving this problem in computer science is to use a “clustering algorithm”. Clustering algorithms are a very broad class of techniques used to identify clusters of points in space, and most of them are very simple to implement and run. Below, you can see the results of one clustering algorithm — called the “mean shift” algorithm — in practice.

Most clumps have been spotted correctly, and the results look good! However, it took quite a bit of fine-tuning and filtering to get the results to look like this. In the image above, the “bandwidth” parameter — the approximate “size” of each cluster — is about equal to the resolution of the image. Increasing the bandwidth can make the algorithm identify more clumps by grouping together clusters of points that are more diffuse. Unfortunately, the larger bandwidth also increases the likelihood that two or more “real” clumps will mistakenly be grouped into one. Here are the clusters we get when the bandwidth is twice as large:

Now that we’ve allowed clusters to be more spread-out, we’ve picked up on the cluster in the upper left. But, the three distinct clumps at the bottom edge of this galaxy have melded into just two, which is not what we want! This is just one of the parameters that we needed to tune. Another is the number of marks required to call a cluster a “clump”. Require too many, and you ignore valuable objects that we’re interested in. Require too few, and the algorithm picks up on objects that are really just noise.

How do we solve this problem? One thing that we tried was to have three members of the science team to classify 1,000 galaxies, so that we could see how their classifications agreed with each other and with volunteers’ marks. We found that when 2 out of 3 members of the science team identified a clump, a majority of volunteers identified it as well. This was a good sign, and it told us about how many volunteer marks to expect per clump. In general, if 60% of volunteers leave a mark within a few pixels of the same spot, we consider that spot to be a clump.

Another technique that we used was more radical. While we started out using the simple clustering algorithm we’ve described so far, we found that it was much more effective to account for who was leaving each mark. Every volunteer is an individual person, with their own clump-classifying habits. Some volunteers are very conservative and only click on a clump when they’re completely certain; others are optimists who want to make sure that no faint clumps get missed. Sometimes volunteers make genuine mistakes and believe it or not we even get a few spammers who just click all over the image! We wanted to design an aggregation system that would make best use of all volunteers’ skills and talents (and if possible even the spammers!) to help us find as many real clumps as possible, without accidentally including any other objects that can masquerade as clumps. 

To build our aggregation system, we started with an idea that was first proposed by Branson et al (2017). At its core, our system still uses a type of clustering algorithm, called a facility location algorithm. The facility location algorithm builds clusters of volunteer clicks that have a very specific connectivity pattern, which looks like this.

An example of the “facility location” algorithm. The blue “F”s mark proposed facilities, which are connected to red “C”s (cities). In practice, the facilities represent the true locations of clumps while the cities represent your marks identifying them.

Each cluster contains a central node, referred to as a “facility”, which is connected to one or more other nodes, referred to as “cities”. Facility location algorithms get their name because they are often used to minimise the cost of distributing some essential commodity like electricity or water from a small number of producers (the facilities) to a larger number of consumers (the cities). Building a facility incurs a cost and so does connecting a city to a facility. When we use the algorithm in our aggregator, the volunteer clicks that we want to group into clusters become the facilities and cities. The trick to finding the right clusters is how we choose to define the costs for facility creation and facility-city connection. 

The costs we use are based on a statistical model that tries to understand how different volunteers behave when they classify clumpy galaxies. For each volunteer, the model learns how likely that volunteer is to miss real clumps or accidentally click on other features in the subject images. The exact location of real clumps in an image can be ambiguous, so when the model thinks that a volunteer has clicked on a real clump, it also tries to predict how accurate their annotation is. But it isn’t just the volunteers that are unique – different subjects have different characteristics too, and it may be much more difficult to spot clumps in some galaxies than it is in others. For example, spotting bright, well separated clumps on a faint background is likely to be much easier than spotting faint closely packed clumps in a noisy image. Our aggregator model takes this into account as well by trying to understand just how difficult finding clumps is in different images.

How does the aggregator model work out how volunteers are behaving? Do we tell it the right answer for a handful of subjects and check the volunteers’ annotations against them? Actually no, because we don’t know exactly what the right answer is! One of the goals of Galaxy Zoo: Clump Scout was to let the volunteers decide together exactly what it takes for a feature to be a clump. So we don’t give our model any information except the clicks that the volunteers provide. Just by comparing how different volunteers respond to different images as the classifications arrive, and comparing their annotations with the clusters found by the facility location algorithm, our model slowly learns the combination of all volunteer behavioural traits and image difficulties that best explain the classification data it has seen.

Once our model provides its best description of the volunteers and images, we define the costs for the facility location algorithm. We specify that turning a volunteer’s click into a facility is more expensive for very optimistic volunteers, who might click on slightly more features that aren’t really clumps. This reduces the chance of accidentally contaminating the clump detections. Connecting clicks to an existing facility costs more if the volunteers that provided them seem optimistic. On the other hand, if it seems like a volunteer is more pessimistic or their clicks are slightly less accurate, then it becomes cheaper to connect their clicks into an existing cluster. This ensures that we don’t miss those hard-to-spot clumps with fewer clicks or more widely spread clicks.

But wait a minute! Were you reading carefully? Our model’s understanding of the volunteers and images is partly based on the clusters that were found, but the cost of creating the clusters depends on the volunteers’ behaviour! How does that work?! Good question. Whenever a new volunteer joins the project, we don’t know anything about them, so we make some reasonable assumptions about how they will behave. In a similar way, we assume that all subjects have roughly similar characteristics. We call these assumptions the “priors” of our model. These priors let us get started with a really rough set of clusters that our model can use to make an initial guess about the volunteers and subjects. Then we can use that guess to set some new costs and find some new, more refined clusters. With these clusters, our model can make another, better-informed prediction. Our algorithm keeps refining its guess and click-to-cluster assignments over and over again until the model predictions and the corresponding clusters don’t change any more. 

Compared to our simplest aggregator, our best results from our more advanced method is better at picking up faint clumps and filtering out noise. It’s also the first time this sort of method has been used in the pipeline of a major citizen science project like this one. This aggregator will be the subject of one of our upcoming papers on Clump Scout, and we are very excited to share the results.

A special thanks on this post goes out to the other members of the Clump Scout team, who helped ensure that the details of our aggregation process were as accurate and simply explained as possible. In the next week or two we’ll publish a second post detailing some of the scientific findings we’ve gotten from our results. Thank you, and stay tuned!