Visualizing the decision trees for Galaxy Zoo
Today we’ve added a new tool that visualizes the full decision tree for every Galaxy Zoo project from GZ2 onward (GZ1 only asked users one question, and would make for a boring visualization). Each tree shows all the possible paths Galaxy Zoo users can take when classifying a galaxy. Each “task” is color-coded by the minimum number of branches in the tree a classifier needs to take in order to reach that question. In other words, it indicates how deeply buried in the tree a particular question is, a property that is helpful when scientists are analyzing the classifications.
Galaxy Zoo has used two basic templates for its decision trees. The first template allowed users to classify galaxies into smooth, edge-on disks, or face on disks (with bars and/or spiral arms) and was used for Galaxy Zoo 2, the infrared UKIDSS images, and is currently being used for the SDSS data that is live on the site. The second template was designed for high-redshift galaxies, and allows users to classify galaxies into smooth, clumpy, edge on disks, or face on disks. This template was used for Galaxy Zoo: Hubble (GZ3), FERENGI (artificially redshifted images of galaxies), and is currently being used by the CANDELS and GOODS images in GZ4. Although these final three projects ask the same basic questions, there are some subtle differences between them in the questions we ask about the bulge dominance, “odd” features, mergers, spiral arms, and/or clumps.
If you ever wanted to know all the questions Galaxy Zoo could possibly ask you, head on over to the new visualization and have a look!
10 responses to “Visualizing the decision trees for Galaxy Zoo”
Trackbacks / Pingbacks
- April 27, 2015 -
I think that the type “Irregular” needs to be added when asked context of clumping: straight, chain, ,cluster and spiral, as these do not cover the majority of items put up for classifaction.
So is there some canonical source for these decision trees other than the CoffeeScript *_tree.coffee files here?:
The CoffeeScript files should be canonical for the current versions, since those are what control the order in which questions are asked.
Here, in this decision tree I don’t see any decision related to the colour of the galaxy. Is it that there’s no relation at all between the colour and the shape and/or the age of the galaxies?
There’s an extremely strong relation – but separating the relationships between color and shape is exactly the aim of Galaxy Zoo. We want to measure the shape _independently_ of color, age, or other variables that are often used as proxies, and then use that data to work out the physical relationships between these variables.
@Kyle Willett: your answer is a little confusing at first, but I resolved the confusion for myself. I want to know if you agree with how I made it out for myself:
Confusion: As a human classifier there is nothing in the decision tree related to the colour. At first I wanted to object to your saying that an independent decision on the colour in the decision tree would perfectly well capture the independence of the morphology and colour. But there is NOTHING related to the colour here at all, which means that you are neglecting the colour.
My resolution of the problem: the human classifiers only classify the morphology. If colour was included, the relationship between morphology and colour would already be in the output received from human classifiers, and the morphology and the colour would not be independent FROM THE POINT OF VIEW OF THE MACHINE that is trained using the data with the classifications from the human classifiers. It is easy to analyse the colour using computer algorithms, and these can be included as features used in the training of the classification software. This means that the dependence is not implicit in the human classifications and can be inferred by the computer itself when being trained. This is probably a better option than including colour in the human classifications already.
P.S. I have been classifying in Galaxy Zoo for years now, but I arrived at this post with a link from an online course that I am doing called “Data Driven Astronomy” from the University of Sydney. So I am looking at it from the point of view that the goal of Galaxy Zoo is to have supervised “learning materal” for machine learning algorithms. (Apologies if my machine learning vocab is still lacking a bit). Also, I hope that my post is intelligible… I had to think hard in order to express my idea.
Hi @Ruan – your description is mostly correct, if I understood it correctly. One part that I would emphasize is that Galaxy Zoo is not trying to teach the automatic algorithms to learn a “dependence”, necessarily – to me, that term implies that one characteristic is definitively caused by another. The intent of measuring color and morphology separately is to learn the relationship between those two attributes (and many others) – some of the most interesting results of the project came by studying examples of galaxies that don’t conform to the traditional associations between the two.
Thanks for your comment and participation in GZ. I hope your studies continue to go well.
i think that the idea of express the decision tree algorithm in this way make it very pedagogical for people not studied in science, i study Phisics and it’s my first time manipulating this algorithm and this way of fit the algorithm is very nice and funny and when working with children is very useful to teach of the science.
greetings from Huila-Colombia
Is gravitational lensing from dark matter the primary reason people would be selecting “Lens or arc” from T06? Or is it more likely from baryonic matter (stars, galaxies and clusters)? Just curious since dark matter and energy account for far more of the total matter in the observable universe.
Good article and discussion. Take care everyone!