Visualizing the decision trees for Galaxy Zoo

This post (and visualization) is by Coleman Krawczyk, a Zooniverse Data Scientist at the ICG at the University of Portsmouth

Today we’ve added a new tool that visualizes the full decision tree for every Galaxy Zoo project from GZ2 onward (GZ1 only asked users one question, and would make for a boring visualization).  Each tree shows all the possible paths Galaxy Zoo users can take when classifying a galaxy.  Each “task” is color-coded by the minimum number of branches in the tree a classifier needs to take in order to reach that question.  In other words, it indicates how deeply buried in the tree a particular question is, a property that is helpful when scientists are analyzing the classifications.

Galaxy Zoo has used two basic templates for its decision trees.  The first template allowed users to classify galaxies into smooth, edge-on disks, or face on disks (with bars and/or spiral arms) and was used for Galaxy Zoo 2, the infrared UKIDSS images, and is currently being used for the SDSS data that is live on the site. The second template was designed for high-redshift galaxies, and allows users to classify galaxies into smooth, clumpy, edge on disks, or face on disks. This template was used for Galaxy Zoo: Hubble (GZ3), FERENGI (artificially redshifted images of galaxies), and is currently being used by the CANDELS and GOODS images in GZ4.  Although these final three projects ask the same basic questions, there are some subtle differences between them in the questions we ask about the bulge dominance, “odd” features, mergers, spiral arms, and/or clumps.

Visualization of the decision tree for Galaxy Zoo 2 (GZ2), by C. Krawcyzk. Colors indicate the depth of a particular question within the decision tree.

Visualization of the decision tree for Galaxy Zoo 2 (GZ2), by C. Krawczyk. Colors indicate the depth of a particular question within the tree.

If you ever wanted to know all the questions Galaxy Zoo could possibly ask you, head on over to the new visualization and have a look!

About Kyle Willett

Kyle Willett is a postdoc and astronomer at the University of Minnesota. He works as a member of the Galaxy Zoo team, and gets to study galaxy morphology and evolution, AGN, blazars, megamasers, citizen science engagement, and many other cool things.

10 responses to “Visualizing the decision trees for Galaxy Zoo”

  1. Robert Maher says :

    I think that the type “Irregular” needs to be added when asked context of clumping: straight, chain, ,cluster and spiral, as these do not cover the majority of items put up for classifaction.

  2. murraycu says :

    So is there some canonical source for these decision trees other than the CoffeeScript * files here?:

    • Kyle Willett says :

      The CoffeeScript files should be canonical for the current versions, since those are what control the order in which questions are asked.

  3. cniharral says :

    Here, in this decision tree I don’t see any decision related to the colour of the galaxy. Is it that there’s no relation at all between the colour and the shape and/or the age of the galaxies?


    • Kyle Willett says :

      There’s an extremely strong relation – but separating the relationships between color and shape is exactly the aim of Galaxy Zoo. We want to measure the shape _independently_ of color, age, or other variables that are often used as proxies, and then use that data to work out the physical relationships between these variables.

      • Ruan Vermeulen says :

        @Kyle Willett: your answer is a little confusing at first, but I resolved the confusion for myself. I want to know if you agree with how I made it out for myself:

        Confusion: As a human classifier there is nothing in the decision tree related to the colour. At first I wanted to object to your saying that an independent decision on the colour in the decision tree would perfectly well capture the independence of the morphology and colour. But there is NOTHING related to the colour here at all, which means that you are neglecting the colour.

        My resolution of the problem: the human classifiers only classify the morphology. If colour was included, the relationship between morphology and colour would already be in the output received from human classifiers, and the morphology and the colour would not be independent FROM THE POINT OF VIEW OF THE MACHINE that is trained using the data with the classifications from the human classifiers. It is easy to analyse the colour using computer algorithms, and these can be included as features used in the training of the classification software. This means that the dependence is not implicit in the human classifications and can be inferred by the computer itself when being trained. This is probably a better option than including colour in the human classifications already.

        P.S. I have been classifying in Galaxy Zoo for years now, but I arrived at this post with a link from an online course that I am doing called “Data Driven Astronomy” from the University of Sydney. So I am looking at it from the point of view that the goal of Galaxy Zoo is to have supervised “learning materal” for machine learning algorithms. (Apologies if my machine learning vocab is still lacking a bit). Also, I hope that my post is intelligible… I had to think hard in order to express my idea.

      • Kyle Willett says :

        Hi @Ruan – your description is mostly correct, if I understood it correctly. One part that I would emphasize is that Galaxy Zoo is not trying to teach the automatic algorithms to learn a “dependence”, necessarily – to me, that term implies that one characteristic is definitively caused by another. The intent of measuring color and morphology separately is to learn the relationship between those two attributes (and many others) – some of the most interesting results of the project came by studying examples of galaxies that don’t conform to the traditional associations between the two.

        Thanks for your comment and participation in GZ. I hope your studies continue to go well.

  4. Juan Camilo Ramirez says :

    i think that the idea of express the decision tree algorithm in this way make it very pedagogical for people not studied in science, i study Phisics and it’s my first time manipulating this algorithm and this way of fit the algorithm is very nice and funny and when working with children is very useful to teach of the science.

    greetings from Huila-Colombia

  5. Scott Morgan says :

    Is gravitational lensing from dark matter the primary reason people would be selecting “Lens or arc” from T06? Or is it more likely from baryonic matter (stars, galaxies and clusters)? Just curious since dark matter and energy account for far more of the total matter in the observable universe.

    Good article and discussion. Take care everyone!

Trackbacks / Pingbacks

  1. Explore Galaxy Zoo Classifications | Galaxy Zoo - April 27, 2015

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: