Space Lasers and the Cosmic Martini: Removing Data Artifacts
As long as there are big data surveys, there will be data artifacts. Our corner of Astronomy is no exception: although the vast majority of images in SDSS and CANDELS are of high quality and therefore of high scientific value, poor quality images do still exist. The Galaxy Zoo team has worked hard to remove as many as possible from both samples so most “bad” images never even make it into the database, but this process is imperfect because computers have trouble identifying every kind of artifact (for some of the same reasons they have trouble identifying different galaxy types).
Of course, as we’ve seen time and again, Galaxy Zoo users have no problem whatsoever spotting the things the computers miss:
The thread on Talk where this image was discussed pointed out that this was in the “Cosmic Scarf” of SDSS, where most of the fields have poor image quality:
Now, most of the fields in the zoomed-out image above were removed from the database and will never be shown on the website, but even the parts that look okay in the zoomed-out image don’t look so great when you zoom in. SDSS combines a number of its quality flags to give each field a “score” from 0 (terrible) to 1 (excellent) to assess its quality, but it’s not always that reliable. For example, although fields with scores larger than 0.6 are generally considered good, this field has a score of 0.77 but is clearly not quite right:
And this field has a much lower score of 0.37 but the images are classifiable:
So any choice we made at the beginning based just on the computer evaluations was going to leave some artifacts in, and we chose to err on the side of showing as many classifiable images as possible (increasing the number of artifacts kept in).
The good news is that Galaxy Zoo has always been adaptable, improving with input from all its participants. Now that this field has been flagged, the science team is working on a two-pronged approach: first, removing the entire “cosmic scarf” should immediately help prevent the majority of these big groups of artifacts from being loaded onto the server. Second, we’re working on finding a better method of removing those artifacts that remain, using your classifications and also your hashtags on Talk. (We’re also working on using this to help make the computers better at spotting artifacts in the future.)
So keep clicking, and remember, even your “artifact” clicks are useful.