Space Lasers and the Cosmic Martini: Removing Data Artifacts

As long as there are big data surveys, there will be data artifacts. Our corner of Astronomy is no exception: although the vast majority of images in SDSS and CANDELS are of high quality and therefore of high scientific value, poor quality images do still exist. The Galaxy Zoo team has worked hard to remove as many as possible from both samples so most “bad” images never even make it into the database, but this process is imperfect because computers have trouble identifying every kind of artifact (for some of the same reasons they have trouble identifying different galaxy types).

Of course, as we’ve seen time and again, Galaxy Zoo users have no problem whatsoever spotting the things the computers miss:

Poor quality image from SDSS

Not a Green Pea unless the universe is Tomato Soup.

The thread on Talk where this image was discussed pointed out that this was in the “Cosmic Scarf” of SDSS, where most of the fields have poor image quality:

Zoom-out of SDSS cosmic scarf

Now, most of the fields in the zoomed-out image above were removed from the database and will never be shown on the website, but even the parts that look okay in the zoomed-out image don’t look so great when you zoom in. SDSS combines a number of its quality flags to give each field a “score” from 0 (terrible) to 1 (excellent) to assess its quality, but it’s not always that reliable. For example, although fields with scores larger than 0.6 are generally considered good, this field has a score of 0.77 but is clearly not quite right:

SDSS Field with high score but bad quality

And this field has a much lower score of 0.37 but the images are classifiable:

So any choice we made at the beginning based just on the computer evaluations was going to leave some artifacts in, and we chose to err on the side of showing as many classifiable images as possible (increasing the number of artifacts kept in).

The good news is that Galaxy Zoo has always been adaptable, improving with input from all its participants. Now that this field has been flagged, the science team is working on a two-pronged approach: first, removing the entire “cosmic scarf” should immediately help prevent the majority of these big groups of artifacts from being loaded onto the server. Second, we’re working on finding a better method of removing those artifacts that remain, using your classifications and also your hashtags on Talk. (We’re also working on using this to help make the computers better at spotting artifacts in the future.)

So keep clicking, and remember, even your “artifact” clicks are useful.

Tags: , ,

6 responses to “Space Lasers and the Cosmic Martini: Removing Data Artifacts”

  1. Lizardly says :

    I suggest leaving in the artifacts, as they are useful for the Pure Art thread.

  2. Jean Tate says :

    For some of the artifacts – such as the ‘green pea in tomato soup’ image – to the eye of this zooite it looks like nothing more than the wrong color balance: the sky is too red, and the galaxy too green. Make the sky black, and maybe the galaxy will become a shade of yellow similar to that of any other ETG at a similar redshift; if not, adjust the remaining color balance until it is. Whether this would work on an object-by-object basis only, or whether an entire Field could be cleaned up at once … To me, many of these kinds of artifacts do not seem to reflect problems with the resolution (though they may be somewhat fuzzier than most other SDSS images, worse seeing perhaps?), so the basic data may well be quite good, and useable.

    • Brooke Simmons says :

      I think that’s a good point — if it is indeed a galaxy. In some cases the image shows objects that look a bit like ETGs but which, when you zoom out and take the whole field in context, turn out to be stars observed with an out-of-focus telescope, or with terrible seeing. And sometimes that odd color balance is a sign of an issue in the reduction and calibration of the data, which means that even if we can use the morphologies, the photometry isn’t reliable, so it’s hard to put the morphologies in proper scientific context. In either case, removing them can help to maximize the usefulness of clicks.

      • Jean Tate says :

        Oh yes, no question that including poor quality images in Galaxy Zoo is a poor use of both researchers’ and zooites’ time and effort (except to the extent that the data can be mined to better predict which SDSS images will be too poor to offer up for classification)!

        I guess the total SDSS ‘poor quality image’ real estate (on the sky) is so small – in proportion to the whole – that it’s not worth pursuing these fields further; after all, there are vast repositories of high quality astronomical images no one has even started to process, in a systematic, GZ-like way.

        Is it known why these fields has such poor quality imagery?

  3. Brooke Simmons says :

    Hi Jean, sorry it took me so long to reply, but I wasn’t getting notifications of comments. That should now be fixed.

    I think you’ve hit the nail on the head — these poor quality images are just a small fraction of the total, but they are certainly memorable, and users rightfully don’t like getting them, especially consecutively. So we definitely still want to remove them even though the vast majority of images are of excellent quality.

    As to why this happens, it could be for a number of reasons, but more often than not it seems to be just that the seeing is borderline and/or the telescope goes out of focus. In a totally unofficial capacity I’d just put it down to “stuff happens.” 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: