Machine Learning & Supernovae
This post, from Berkley statistician Joey Richards, is one of three marking the end of this phase of the Galaxy Zoo : Supernova project. You can hear from project lead Mark Sullivan here, and from the Zooniverse’s Chris Lintott here.
Thanks to the efforts of the Galaxy Zoo Supernovae community, researchers in the Palomar Transient Factory collaboration have constructed a machine-learned (ML) classifier that can reliably predict, in near real-time, whether each candidate is a real supernova. ML classification operates by employing previously vetted data to teach computer algorithms a statistical model that can accurately and automatically predict the class for each new candidate (i.e., real transient or not) from observed data on that object. The manual vetting of tens of thousands of supernova candidates by the Galaxy Zoo community has provided PTF an invaluable data set which could be used to accurately train such a ML classifier.
The ML approach is appealing for supernova vetting because it allows us to make probabilistic classification statements, in real-time, about the validity of each new candidate. Further, it allows the simultaneous use of many data sources, including both new and reference PTF imaging data, historical PTF light curves, and information from external, on-line sources such as the Sloan Digital Sky Survey and the U.S. Naval Observatory. In total, our automated ML algorithms use 58 metrics about each supernova candidate, all of which are available within seconds after PTF detection of the candidate. These metrics—features in ML parlance—are fed into a sophisticated algorithm, which uses the aggregate of information from more than 25,000 historical supernova candidates which were rated by the Zoo to instantaneously determine whether each newly observed candidate is a supernova.
Our “ML Zoo” has been operating since the beginning of 2012 and has been thoroughly tested against the Human Zoo scores. We found that the ML Zoo scores correlate reasonably well with the average Human Zoo scores for 7000 supernova candidates observed during the first 3 months of 2012 (Figure 1). We also discovered that the ML Zoo is more effective at finding supernovae. In Figure 2 we show a plot of the supernova false positive rate (% of non-supernovae that were classified as supernovae) versus the supernova missed detection rate (% of confirmed supernovae that were classified as a non-supernovae) by both the Human an ML Zoos for 345 spectroscopically confirmed supernovae from 2010. Indeed, the ML Zoo achieves a smaller missed detection rate at each false positive rate.
Joseph Richards works in the Statistics and Astronomy departments at
the University of California, Berkeley as an NSF-sponsored
postdoctoral researcher supported by an interdisciplinary
Cyber-enabled Discovery and Innovation grant. His main area of focus
is astrostatistics and he holds a Ph.D. in Statistics from Carnegie
Mellon University. In his academic research, he has developed
sophisticated statistical and machine learning methodologies to
analyze large collections of astronomical data.