SDSS Data Release 10 and Galaxy Zoo 2

Since the original launch of the Sloan Digital Sky Survey (SDSS) in 2000, the data from the project has been released to the public on a timescale of every year or two. These Data Releases include both new observations from the telescope, as well as refinements to old data based on improving the processing and reduction routines.

Today, Sloan announced that Data Release 10 (DR10) is now available to both the general astronomy community and to the public. It contains the first release of spectra from the APOGEE experiment, which has been observing tens of thousands of red giant stars in the Milky Way. It also includes new data from BOSS, which has been measuring redshifts for distant galaxies in order to measure cosmological parameters and study structure formation.


Sky coverage of the SDSS data contained in DR10

The original data from Galaxy Zoo was included in the SDSS Data Release 8. That’s quite important for a couple of reasons. It makes it much easier for scientists to use the GZ data, since SDSS uses this as the input for their own database, called CasJobs. This database enables matching of morphologies to other properties of the galaxies that SDSS measures, such as color or size. It also provides one of the main means of access to the data for people who aren’t members of the Galaxy Zoo team. Finally, it’s a validation that your GZ classifications have become a core data product of the survey, and something which is worth preserving and sharing as widely as possible.

In DR10, we’re happy to announce that data from Galaxy Zoo 2 is available for the first time. The reduction and description of the data is covered in a recent paper, which received a very favorable referee report a couple of weeks ago and will be resubmitted soon. We’ll share the paper as soon as it’s been formally accepted. The GZ2 data will also be hosted on our own site within Galaxy Zoo in the near future.


Galaxy Zoo 2 data is now on CasJobs!

Please check out DR10 if you’re interested in peeking at the GZ2 data – do acknowledge both Willett et al. (GZ2) and Ahn et al. (DR10) if you end up using it, though. Congratulations to the SDSS team on this new release!

Tags: , ,

About Kyle Willett

Kyle Willett is a postdoc and astronomer at the University of Minnesota. He works as a member of the Galaxy Zoo team, and gets to study galaxy morphology and evolution, AGN, blazars, megamasers, citizen science engagement, and many other cool things.

22 responses to “SDSS Data Release 10 and Galaxy Zoo 2”

  1. Jean Tate says :

    This is FANTASTIC news Kyle! Congratulations! (can’t wait to read the GZ2 paper)

    In the DR10 Explore tool, under Imaging Summary, there’s an item “Galaxy Zoo”. I’ve tried clicking this, for quite a few objects now, but each time I get an error. Do you know what this should link to?

    • Kyle Willett says :

      Hi Jean,

      The link here isn’t working for me – I’m getting runtime errors from SkyServer, which I suspect means that they’re having problems with their site. I’m trying to contact people and send in bug reports now.

      – Kyle

      • Jean Tate says :

        Thanks. Whatever you did, it worked. And I can see that it gives you access to ALL Galaxy Zoo data, for the object. Awesome!

        There is, however, one error: if you follow zooNoSpec to the Schema Browser entry, you read “This information is identical to that in Galaxy Zoo 1 Table 3. Some objects may have spectroscopic matches in DR8 (though they did not in DR7)“. A great many of the objects in Table 3 DO have DR7 spectra! As zkChris explained, “Table 3 is essentially ‘objects not in table 2’. As explained in the start of section 4, there are several things a galaxy must have to end up in table 2 (which is the table that includes the bias correction). It must have a spectrum, but just having a spectrum isn’t enough – it must also have a redshift between 0.001 and 0.25 (outside this range redshifts are either inaccurate or we don’t have enough information to make a bias correction). We also …

  2. Kyle Willett says :

    Hi Jean,

    You’re correct – DR7 spectra is not the sole reason for inclusion in Table 3 (although it is the main reason). I don’t think the text here is explicitly wrong, but it does leave out a couple details as previously discussed on the forum. I hope the detailed description in the paper will serve as explanation for the non-biased corrections.

    The naming of the tables wasn’t the best choice, but I think it’ll be quite difficult to change those at this late date.

  3. Jean Tate says :

    Not all Stripe 82 co-adds we classified in GZ2 are in SDSS DR10?

    SDSS J232412.38-000050.0 (link takes you to the DR10 Explore tool page) is an object whose Stripe 82 co-add images we classified (see this GZ forum post by Ioannab for two versions). Yet clicking the “Galaxy Zoo” link on the SDSS DR10 Explore page brings up a page with only a GZ1 entry; “No data found for this object” it says, for all GZ2 categories!

    • Kyle Willett says :

      That galaxy is in the DR10 CasJobs data – there appears to be some disconnect with the objid that Skyserver is searching on, though. I’ll forward this to the team and ask for their help. Thanks for the query.

      There will be a few galaxies from the Stripe82 coadds that don’t appear in DR10, BTW, primarily those without spectroscopic redshifts.

      • Jean Tate says :


        It’s not just that particular galaxy/object; that’s just the one for which I could – fairly quickly – get convincing evidence of a mis-match (i.e. discussed in the GZ forum, in DR10, has a spectrum, …). I found quite a few others that are certainly in Stripe 82, are (almost) certainly in the GZ2 Stripe 82 co-adds, has an SDSS spectroscopic redshift, and for which the DR10 Explore tool page says does not have a GZ2 entry.

        Oh, and a big ‘doh!’ for me; I did quite a bit of checking before writing that comment, but the one thing I DIDN’T do was check it using CasJobs! 😦

  4. Kyle Willett says :

    Yes – a small number of galaxies are currently missing from the Explore tool. This isn’t because those classifications don’t exist – as I said above, they’re in CasJobs and are part of the main GZ2 release. The reason they’re not showing up in Explore is because we couldn’t find a 100% match between the IDs used in the various SDSS data releases. The GZ2 data is all from DR7 – between DR7 and DR8, however, SDSS switched to a new ID format to identify galaxies. In almost all cases, there is a clear mapping between DR7 and DR8 and we were able to recover the new ID. There are a couple of exceptions, though – sometimes an observation is withdrawn as a galaxy, and for dense fields or resolved galaxies there can be confusion as to the central source.

  5. Kyle Willett says :

    (cont.) For the three normal depth GZ2 samples (zoo2MainSpecz, zoo2MainPhotoz, and zoo2Stripe82Normal), we found matches for >97% of the galaxies in our sample. For the coadded catalogs, the matching ratio was worse – 12-13% of galaxies didn’t come up with a dr8objid match. Since the dr8objid is what the Explore tool searches on, they don’t appear, even when you use RA/dec or some other way of finding your galaxy.

    I’m working with the SDSS team to find the best way of improving our matches in the catalog. In the meantime, if you’re interested in a particular galaxy that doesn’t appear in the Explore tool, try CasJobs – I know it’s much more of a pain, but the data should be there.

    • Jean Tate says :

      Thanks Kyle.

      My main interest is in the co-adds, precisely because of their depth. I am (going to) doing a cross-match with the Quench galaxies (both Sample and Control). There won’t be many (potential) matches, but where there are …

      Related question: how do you find the GZ2 co-add image, or images, of a Stripe 82 object (having first determined, somehow, that such an image exists)?

  6. Torsten Elfhag says :

    Hi. I’m looking at galaxies producing stars with ew(Halfa) between 30 and 300Å (in SDSS) and some other parameters. I’m also comparing them with their data in galaxyZoo. Unbiased and oddness>0.8. BUT the result is somewhat strange. I was expecting “odd” looking galaxies since this would comply with the Zoo classification. The 384 galaxies that I get are (most of them) not looking odd at all. So something seems not to be right. Any suggestions????

    • Kyle Willett says :

      “Oddness” can mean many different things in the context of the Galaxy Zoo data. Without specific examples, I would suggest looking at what the most common reply to the odd question was (eg, was it a dust lane galaxy? a merger? a gravitational lens?) and then look at the examples in each of those sub-categories.

      • Torsten Elfhag says :

        Hi Kyle,
        Ok. I will look deeper into the subject. BUT there seem to be nothing “odd” att all with these galaxies. They look suspiciously “dull” and this is somewhat alarming!! I might be back………

  7. Torsten Elfhag says :

    Now I’ve checked it. I’m dowloading the parameter zo.t08_odd_feature_a21_disturbed_debiased
    and I first went for this to be more than 0.8 but then the galaxies looked pretty “normal”. Now I tried to tip the scale and went for those larger than zero but less than 0.11. Lo and behold…..NOW they look odd! Tails and mergers! Have you by any chance turned the scale upside-down?????

  8. Kyle Willett says :

    I sincerely hope not, but anything’s possible. Can you send me specific examples (with SDSS IDs and the debiased classifications)?

    • Torsten Elfhag says :

      Hi Kyle

      Will follow your advice, but I’m still a bit concerned over that I got plenty of “odd” galaxies when selecting the lower range.


  9. Torsten Elfhag says :

    Select s.specobjid, s.ra, s.dec
    into mydb.fourfour
    from galSpecLine as g
    join SpecObj as s
    on s.specobjid=g.specobjid
    join zoo2MainSpecz as zo
    on zo.specobjid=g.specobjid
    and h_beta_flux>0
    and oiii_5007_flux>0
    and nii_6548_flux>0
    and h_alpha_eqw-300
    and s.z>0.028
    and s.z<0.1
    and log(oiii_5007_flux/h_beta_flux)0.8

    This is the SQL that I ran.
    We had expected somewhat disturbed galaxies due to our selection criteria.
    So that is why we got suspicious.
    Then changed the interval for zo.t08_odd_feature….. to between >0 and <0.11 and suddenly we saw the galaxies we had expected from the previous run….

    Best regards


  10. Torsten Elfhag says :


    The reply field removed a part. The text above should be added to the SQL….


  11. Torsten Elfhag says :

    (less than)1.25+0.71/(log(nii_6548_flux/h_alpha_flux)-0.25)
    and s.class=’GALAXY’
    and zo.t08_odd_feature_a21_disturbed_debiased(more than)0.8

    Less than and more than signs truncate the message. The text above should be added…..

  12. Kyle Willett says :

    Hi Torsten,

    I’ve repeated your search, and I think I see the issue. There are two strong things I would change for what you’re trying to do.

    1. The “disturbed” category is not particularly representative of merging galaxies; this was mostly due to a badly designed icon for the project. I strongly suggest using a combination of “irregular” and “merging” (and possibly “other”) as the responses to Task 08 if you’re looking for morphological disturbances.
    2. It’s not sufficient to do a simple vote fraction cutoff for morphology, especially for a feature (like disturbed galaxies) that’s deeply embedded in the decision tree. The reason is that vote fractions are computed with respect ONLY to other responses to that task. Here’s a contrived example: say a galaxy had 100 people classify it in total, and that 99/100 thought there was nothing odd about it, and responded “no” to Task 06. The last person answered “yes” to Task 06 and then chose “disturbed” for Task 08. The vote fraction for Task 08 is now 1.0 (100%), because it’s computed as (number of responses for disturbed) / (total number of responses to Task 08).

    So if you’re just looking at a single response, you can get extremely variable vote fractions based on small-number statistics. That’s what’s happening for your galaxies; most only have 1 or 2 responses to the “anything odd” question but those who did selected “disturbed” as the odd feature they were looking at. However, the vast majority of classifiers didn’t think there was anything odd at all, and so never even answered Task 08.

    The approach I suggest is setting both vote fractions on all steps in the decision tree AND setting some minimum number of votes (not just vote fractions) for the morphology that you’re interested in. For example, for “disturbed” galaxies do something like:

    zo.t06_odd_a14_yes_debiased gt 0.8
    AND zo.t08_odd_feature_a21_disturbed_debiased gt 0.8
    AND zo.t08_odd_feature_a21_disturbed_weight gt 20

    This ensures that the majority of the classifiers _answered the question_ that you’re concerned with, and that you have enough votes to make the fraction statistically meaningful. If you don’t do that, you’re not accounting for all the classifiers who (implicitly) vote against that morphological feature being present, since they chose a different path where this feature isn’t relevant.

    Table 3 in Willett et al. (2013) has suggested cutoff fractions for the entire tree, but you can explore your own subsamples and set them where appropriate.

    • Torsten Elfhag says :

      Hi Kyle
      Itś been awhile…
      With the zo.t08 I do not get anything at all. But using only zo.t06… 0.8 together with my other search criteria leaves me with about a 1000 galaxies.And they are really disturbed! Any comments???


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: