Announcing Galaxy Zoo’s machine-learning competition (with prize money!)
Since Galaxy Zoo began in 2007, our scientific results have relied on the classifications of our volunteers. These have always been checked (in small numbers) against expert classifications, and several papers have explored how the Galaxy Zoo data compares to results from computers. Galaxy Zoo has compared well with both expert and automated classifications, and that’s helped underscore the science that your observations have made possible.
While doing real science with the Zooniverse has always been our primary goal, we’re also looking to the future; upcoming telescopes like the SKA, LSST, and just-launched Gaia will have billions of new images and detected objects. This will simply be too large for citizen scientists to handle the full scope of data – even if literally everyone on the planet is involved.
This is where Galaxy Zoo will come in yet again. Our goal, which is shared by many groups of astronomers, is to improve the accuracy of the galaxy classifications that can be performed by computers. We’ve done some of this already (Banerji et al. 2010, Huertas-Company et al. 2011), but it’s still not good enough for much of the science we want to do. If we can make these algorithms better, future datasets for citizen science can be selected in advance; we can automatically process the bulk of the images, but still have citizen scientists play a key role in classifying at the more unusual objects. Citizen scientist results will also provide important calibration for the algorithms, and will continue to look for weird and wonderful discoveries like the Voorwerp.
With that goal in mind, we’re pleased to announce the launch of a data science competition for Galaxy Zoo. We’ve partnered with Kaggle, an online platform for predictive modeling that has a massive amount of experience in similar projects. Also working with us is Winton Capital: they’ve generously agreed to provide prize money for the winners of this competition. The first prize is $10,000 USD — we hope this will help incentivize some really great solutions!
Here’s how the competition works. On the Kaggle website, competitors will be given a large set of JPG galaxy images (taken from Galaxy Zoo 2), as well as a big text file with a few dozen variables for each image. These data are a modified version of the classifications that citizen scientists generated in GZ2 (and published in Willett et al. 2013). The goal for competitors is to come up with an algorithm that will predict what those classifications should be based only on the picture. These algorithms are submitted to Kaggle and tested against a second, private set of GZ2 images and classifications. The highest scores on the new set will win the prize money.
We’re really excited about this competition. For Winton, this will help them identify promising candidates who are skilled at predictive analysis that they might be interested in hiring. For Galaxy Zoo, we’ll use the results for two major things: efficient selection of sources for upcoming citizen science projects, AND analyzing the results to see how the algorithms relate to physical properties of galaxies.
The competition is open to anyone in the world, and will run for three months, ending on March 21, 2014. Participants will need significant programming experience, and a math/astronomy background would probably help since the project relies on image analysis and machine learning. If you’re interested, check out the project at https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge.