Space Lasers and the Cosmic Martini: Removing Data Artifacts
As long as there are big data surveys, there will be data artifacts. Our corner of Astronomy is no exception: although the vast majority of images in SDSS and CANDELS are of high quality and therefore of high scientific value, poor quality images do still exist. The Galaxy Zoo team has worked hard to remove as many as possible from both samples so most “bad” images never even make it into the database, but this process is imperfect because computers have trouble identifying every kind of artifact (for some of the same reasons they have trouble identifying different galaxy types).
Of course, as we’ve seen time and again, Galaxy Zoo users have no problem whatsoever spotting the things the computers miss:
The thread on Talk where this image was discussed pointed out that this was in the “Cosmic Scarf” of SDSS, where most of the fields have poor image quality:
Now, most of the fields in the zoomed-out image above were removed from the database and will never be shown on the website, but even the parts that look okay in the zoomed-out image don’t look so great when you zoom in. SDSS combines a number of its quality flags to give each field a “score” from 0 (terrible) to 1 (excellent) to assess its quality, but it’s not always that reliable. For example, although fields with scores larger than 0.6 are generally considered good, this field has a score of 0.77 but is clearly not quite right:
And this field has a much lower score of 0.37 but the images are classifiable:
So any choice we made at the beginning based just on the computer evaluations was going to leave some artifacts in, and we chose to err on the side of showing as many classifiable images as possible (increasing the number of artifacts kept in).
The good news is that Galaxy Zoo has always been adaptable, improving with input from all its participants. Now that this field has been flagged, the science team is working on a two-pronged approach: first, removing the entire “cosmic scarf” should immediately help prevent the majority of these big groups of artifacts from being loaded onto the server. Second, we’re working on finding a better method of removing those artifacts that remain, using your classifications and also your hashtags on Talk. (We’re also working on using this to help make the computers better at spotting artifacts in the future.)
So keep clicking, and remember, even your “artifact” clicks are useful.
I suggest leaving in the artifacts, as they are useful for the Pure Art thread.
For some of the artifacts – such as the ‘green pea in tomato soup’ image – to the eye of this zooite it looks like nothing more than the wrong color balance: the sky is too red, and the galaxy too green. Make the sky black, and maybe the galaxy will become a shade of yellow similar to that of any other ETG at a similar redshift; if not, adjust the remaining color balance until it is. Whether this would work on an object-by-object basis only, or whether an entire Field could be cleaned up at once … To me, many of these kinds of artifacts do not seem to reflect problems with the resolution (though they may be somewhat fuzzier than most other SDSS images, worse seeing perhaps?), so the basic data may well be quite good, and useable.
I think that’s a good point — if it is indeed a galaxy. In some cases the image shows objects that look a bit like ETGs but which, when you zoom out and take the whole field in context, turn out to be stars observed with an out-of-focus telescope, or with terrible seeing. And sometimes that odd color balance is a sign of an issue in the reduction and calibration of the data, which means that even if we can use the morphologies, the photometry isn’t reliable, so it’s hard to put the morphologies in proper scientific context. In either case, removing them can help to maximize the usefulness of clicks.
Oh yes, no question that including poor quality images in Galaxy Zoo is a poor use of both researchers’ and zooites’ time and effort (except to the extent that the data can be mined to better predict which SDSS images will be too poor to offer up for classification)!
I guess the total SDSS ‘poor quality image’ real estate (on the sky) is so small – in proportion to the whole – that it’s not worth pursuing these fields further; after all, there are vast repositories of high quality astronomical images no one has even started to process, in a systematic, GZ-like way.
Is it known why these fields has such poor quality imagery?
Hi Jean, sorry it took me so long to reply, but I wasn’t getting notifications of comments. That should now be fixed.
I think you’ve hit the nail on the head — these poor quality images are just a small fraction of the total, but they are certainly memorable, and users rightfully don’t like getting them, especially consecutively. So we definitely still want to remove them even though the vast majority of images are of excellent quality.
As to why this happens, it could be for a number of reasons, but more often than not it seems to be just that the seeing is borderline and/or the telescope goes out of focus. In a totally unofficial capacity I’d just put it down to “stuff happens.” 🙂