Machine Learning & Supernovae

This post, from Berkley statistician Joey Richards, is one of three marking the end of this phase of the Galaxy Zoo : Supernova project. You can hear from project lead Mark Sullivan here, and from the Zooniverse’s Chris Lintott here.

Thanks to the efforts of the Galaxy Zoo Supernovae community, researchers in the Palomar Transient Factory collaboration have constructed a machine-learned (ML) classifier that can reliably predict, in near real-time, whether each candidate is a real supernova. ML classification operates by employing previously vetted data to teach computer algorithms a statistical model that can accurately and automatically predict the class for each new candidate (i.e., real transient or not) from observed data on that object. The manual vetting of tens of thousands of supernova candidates by the Galaxy Zoo community has provided PTF an invaluable data set which could be used to accurately train such a ML classifier.

The ML approach is appealing for supernova vetting because it allows us to make probabilistic classification statements, in real-time, about the validity of each new candidate. Further, it allows the simultaneous use of many data sources, including both new and reference PTF imaging data, historical PTF light curves, and information from external, on-line sources such as the Sloan Digital Sky Survey and the U.S. Naval Observatory. In total, our automated ML algorithms use 58 metrics about each supernova candidate, all of which are available within seconds after PTF detection of the candidate. These metrics—features in ML parlance—are fed into a sophisticated algorithm, which uses the aggregate of information from more than 25,000 historical supernova candidates which were rated by the Zoo to instantaneously determine whether each newly observed candidate is a supernova.

Human vs Machine scores for Supernova candidates

False Positive rate and missed detection rate for human and machine classifiers.

Our “ML Zoo” has been operating since the beginning of 2012 and has been thoroughly tested against the Human Zoo scores. We found that the ML Zoo scores correlate reasonably well with the average Human Zoo scores for 7000 supernova candidates observed during the first 3 months of 2012 (Figure 1). We also discovered that the ML Zoo is more effective at finding supernovae. In Figure 2 we show a plot of the supernova false positive rate (% of non-supernovae that were classified as supernovae) versus the supernova missed detection rate (% of confirmed supernovae that were classified as a non-supernovae) by both the Human an ML Zoos for 345 spectroscopically confirmed supernovae from 2010. Indeed, the ML Zoo achieves a smaller missed detection rate at each false positive rate.

Joseph Richards works in the Statistics and Astronomy departments at
the University of California, Berkeley as an NSF-sponsored
postdoctoral researcher supported by an interdisciplinary
Cyber-enabled Discovery and Innovation grant. His main area of focus
is astrostatistics and he holds a Ph.D. in Statistics from Carnegie
Mellon University. In his academic research, he has developed
sophisticated statistical and machine learning methodologies to
analyze large collections of astronomical data.

8 responses to “Machine Learning & Supernovae”

  1. robert gagliano says :

    I wonder how the ML Zoo would fare in the graph of MDR vs. FPR if the Human Zoo were restricted to only the most experienced Human classifiers…..kind of Mel Fischer or Gary Kasparov vs. “Big Blue”? I suspect that the ML Zoo would lose.

  2. robert gagliano says :

    Make that Bobby Fischer ( famous chess player) not Mel Fischer (famous ship wreck salvager).

    • chrislintott says :

      Hi

      I think this comment is spot on – and as I mention (I hope) in my post on the Zooniverse blog we’re working on exactly that. I’m hoping to blog more about this in the next couple of days, so watch this space…

      Chris

      • Wolfgang says :

        With a bit more advanced possibilities for humans to view the images (user choosable field size to get enough stars for intensity profile comparisions, user choosable contrast settings to be able to bring candidates out of saturation, …) the human hit rates could easily increase further.

  3. Graham Dungworth says :

    Joey and Mark have made a very bold claim indeed. The HZ’s do retain a complete image data collection for the whole periods 2010 through 2012. Several posts on the supernovae threads noted supernovae discoveries despite bad and chronic data imagery. distortion etc. Yet with our names, generally 10-20 participants, were appended to manyof those discoveries despite considerable consternation on our part. The ML claim appears to imply that even bad data can be transformed into enhancing discovery rates This is patently absurd. There was ample time to provide feedback during the last three years but participation by the machines was was even more minimal than Clarke and Pohl’s Machine Stored in the “Last Theorem”, and consider what their demise was.

  4. Wolfgang says :

    One or two decades ago AI did reach a hit rate for correct classifications of no more than 90 to at most 95% with more complex tasks. So plenty to do for a second run human classification.
    What are the numbers with your ML approach ?

Trackbacks / Pingbacks

  1. I, for one, welcome our new machine collaborators « Zooniverse - August 3, 2012

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: