Man vs Machine?
Manda’s paper on improving automatic galaxy classification seems to have caused quite a bit of concern and comment. Is the future of the Zoo portrayed above, with an out-of-control machine wrecking all we’ve come to hold dear? After all, we’ve always believed it important that we don’t waste your time by having you do tasks that computers are perfectly capable of completing. Are the Zookeepers putting the Zoo out of business?
You’ll be pleased to hear the short answer is obviously ‘no’. The long answer is more interesting; it turns out that we absolutely have to work together with machines in order to keep Galaxy Zoo alive for the next five years or more.
Looking at the history of galaxy classification is interesting – by the early 1990s, astronomers were aware that surveys like the Sloan Digital Sky Survey, which provides the Galaxy Zoo data, were going to be too big for astronomers themselves to classify. They therefore threw themselves into developing automatic classification routines, but as we all know human classifications were still the gold standard. That’s why Galaxy Zoo was needed, and why we’ve made the impact we have in just a couple of short years.
Looking forward, though, it’s clear that the advent of the Zoo has only bought us a little more time in the race against machine. New surveys, larger, deeper and more ambitious that the Sloan are being planned; one of the largest, the Large Synoptic Survey Telescope, is estimated to produce 30 TB of data per night. 30TB is a lot – the equivalent of 20 months worth of high quality video, for example.
That amount of data will overwhelm even the largest Zoo. We’ll need to automatically classify most of it – and more importantly use machines to decide which objects are interesting enough (or confusing enough!) to be passed to humans for more careful attention. We’ve already done this once, with Galaxy Zoo : Supernova taking the output of an automated classification routine and sending the most likely supernovae to the Zoo for further analysis.
This is a huge challenge for machine learning academics and researchers, and I suspect you’ll be hearing a lot more about our efforts in this direction. Crucially, what Manda’s paper shows for the first time is that the automatic routines can be improved by the use of Zoo data. The neural network ‘learns’ how to think like the Zoo and does a pretty good (but not yet good enough) job – and that’s good for both humans and computers.