We submitted the Galaxy Zoo CANDELS paper in May. Now, after some discussion with a very helpful referee, the paper is accepted! I hope our volunteers are as thrilled as I was to get the news. It happened within days of the Galaxy Zoo: Hubble paper acceptance. Hurray!
If you’d like to read the paper, it’s publicly available as a pre-print now and will be published at some point soon in the Monthly Notices of the Royal Astronomical Society. The pre-print version is the accepted version, so it should only differ from the eventual published paper by a tiny bit (I’m sure the proof editor will catch some typos and so on).
The paper may be a little long for a casual read, so here’s an overview:
- We collected 2,149,206 classifications of 52,073 subjects, from 41,552 registered volunteers and 53,714 web browser sessions where the classifier didn’t log in. In the analysis we assumed each of those unique browser sessions was a separate volunteer.
- The raw consensus classifications are definitely useful, but we also weighted the classifications using a combination of “gold standard” data and consensus-based weighting. That is, classifiers were up- or down-weighted according to whether they could tell a galaxy apart from a star most of the time, and then the rest of the weighting proceeded in the same way it has for every other GZ dataset. No surprise: the majority of volunteers are excellent classifiers.
- 6% of the raw classifications were from 86 classifiers who both classified a lot and gave the same answer (usually “star or artifact”) at least 98% of the time, no matter what images they saw. We have some bots, but they’re quite easy to spot.
- Even with a pretty generous definition of what counts as “featured”, less than 15% of galaxies in the relatively young Universe that this data examines have clear signs of features. Most galaxies in the data set are relatively smooth and featureless.
- Galaxy Zoo compares well with visual classifications of the same galaxies done by members of the CANDELS team, despite the fact that the comparison is sometimes hard because the questions they asked weren’t the same as what we did. This is, of course, a classic problem when comparing data sets of any kind: to some extent it’s always apples-vs-oranges, and the devil is in the details.
- By combining Galaxy Zoo classifications with multi-wavelength light profile fitting — where we fit a 2D equation to the distribution of light in a galaxy, the properties of which correlate pretty well with whether a galaxy has a strong disk component — we’ve identified a population of likely disk-dominated galaxies that also completely lack the features that are common in disk galaxies in the nearby, more evolved Universe. These disks don’t have spiral arms, they don’t have bars, they don’t have clumps. They’re smooth, but they are disks, not ellipticals. They tend to be a bit more compact than disk galaxies that do have features, even though they’re at the same luminosities. They’re also hard to identify using color alone (which echoes what we’ve seen in past Galaxy Zoo studies of various different kinds of galaxies). You really need both kinds of morphological information to reliably find these.
- The data is available for download for those who would like to study it: data.galaxyzoo.org.
With the data releases of Galaxy Zoo: Hubble and Galaxy Zoo CANDELS added to the existing Galaxy Zoo releases, your combined classifications of over a million galaxies near and far are now public. We’ve already done some science together with these classifications, but there’s so much more to do. Thanks again for enabling us to learn about the Universe. This wouldn’t have been possible without you.
I’m incredibly happy to report that the main paper for the Galaxy Zoo: Hubble project has just been accepted to the Monthly Notices of the Royal Astronomical Society! It’s been a long road for the project, but we’ve finally reached a major milestone. It’s due to the efforts of many, including the scientists who designed the interface and processed the initial images, the web developers who managed our technology and databases, more than 80,000 volunteers who spent time classifying galaxies and discussing them on the message boards, and the distributed GZ science team who have been steadily working on analyzing images, calibrating data, and writing the paper.
The preprint for the Galaxy Zoo: Hubble paper is available here. The release of GZH also syncs up with the publication of the Galaxy Zoo: CANDELS catalog, led by Brooke Simmons; she’ll have a blog post up later today, and the GZC paper is also available as a preprint.
Galaxy Zoo: Hubble began in 2010; it was the first work of GZ to move beyond the images taken with the Sloan Digital Sky Survey (SDSS). We were motivated by the need to study the evolution and formation of galaxies billions of years ago, in the early days of the Universe. While SDSS is an amazing telescope, it doesn’t have the sensitivity or resolution to make a quality image of a typical galaxy beyond a redshift of about z=0.4 (distances of a few billion parsecs). Instead, we used images from the Hubble Space Telescope, the flagship and workhorse telescope of NASA for the past two decades, and asked volunteers to help us classify the shapes of galaxies in several of Hubble’s largest and deepest surveys. After more than two years of work, the initial set of GZH classifications were finished in 2012 and the site moved on to other datasets, including CANDELS, UKIDSS, and Illustris.
So why has it taken several years to finish the analysis and publication of the data? The reduction of the GZH data ended up being more complicated and difficult than we’d originally anticipated. One key difference lies in our approach to a technique we call debiasing; these refer to sets of corrections made to the raw data supplied by the volunteers. There’s a known effect where galaxies that are less bright and/or further away will appear dimmer and/or smaller in the images which are being classified. This skews the data, making it appear that there are more elliptical/smooth galaxies than truly exist in the Universe. With SDSS images, we dealt with this by assuming that the nearest galaxies were reliably measured, and then deriving corrections which we applied to the rest of the sample.
In Galaxy Zoo: Hubble, we didn’t have that option available. The problem is that there are two separate effects in the data that affect morphological classification. The first is the debiasing issue just mentioned above; however, there’s also a genuine change in the populations of galaxies between, say, 6 billion years ago and the present day. Galaxies in the earlier epochs of the Universe were more likely to have clumpy substructures and less likely to have very well-settled spiral disks with features like bars. So if we just tried to correct for the debiasing effect based on local galaxies, we would have explicitly removed any of the real changes in the population over cosmic time. Since those trends are exactly what we want to study, we needed another approach.
Our solution ended up bringing in another set of data to serve as the calibration. Volunteers who have classified on the current version of the site may remember classifying the “FERENGI” sample. These were images of real galaxies that we processed with computer codes to make them look like they were at a variety of distances. The classifications for these images, which were completed in late 2013, gave us the solution to the first effect; we were able to model the relationship between distance to the galaxy and the likelihood of detecting features, and then applied a correction based on that relationship to the real GZH data.
The new GZH data is similar in format and structure to the data release from GZ2. The main product is a very large data table (113,705 rows by 172 columns) that researchers can slice and dice to study specific groups of galaxies with morphological measurements. We’re also releasing data from several related image sets, including experiments on fading and swapping colors in images, the effect of bright active galactic nuclei (AGN), different exposure depths, and even a low-redshift set of SDSS Stripe 82 galaxies classified with the new decision tree. All of the data will be published in electronic tables along with the paper, and are also downloadable from data.galaxyzoo.org. Our reduction and analysis code is available as a public Github repository.
The science team has already published two papers based on preliminary Galaxy Zoo: Hubble data. This included a paper led by Edmond Cheung (UCSC/Kavli IPMU) that concluded that there is no evidence connecting galactic bars and AGN over a range of redshifts out to z = 1.0. Tom Melvin (U. Portsmouth) carefully examined the overall bar fraction in disks using COSMOS data, measuring a strong decrease in bar fraction going back to galaxies 7.8 billion years ago. We’re now excited to continue new research areas, including a project led by Melanie Galloway (U. Minnesota) on the evolution of red disk galaxies over cosmic time. We hope GZH will enable a lot more science very soon from both our team and external researchers, now that the data are publicly released.
A massive “thank you” again to everyone who’s helped with this project. Galaxy Zoo has made some amazing discoveries with your help in the past eight years, and now that two new unique sets of data are openly available, we’re looking forward to many more.
The Universe is pretty huge, and to understand it we need to collect vast amounts of data. The Hubble Telescope is just one of many telescopes collecting data from the Universe. Hubble alone produces 17.5 GB of raw science data each week. That means since its launch to low earth orbit in April 1990, it’s collected roughly a block of data equivalent in size to 6 million mp3 songs! With the launch of NASA’s James Webb Telescope just around the corner – (a tennis court sized space telescope!), the amount of raw data we can collect from the Universe is going to escalate dramatically. In order to decipher what this data is telling us about the Universe we need to use sophisticated statistical techniques. In this post I want to talk a bit about a particular technique I’ve been using called a Markov-Chain-Monte-Carlo (MCMC) simulation to learn about galaxy evolution.
Before we dive in into the statistics let me try and explain what I’m trying to figure out. We can model galaxy evolution by looking at a galaxy’s star formation rate (SFR) over time. Basically we want know to how fast a particular galaxy is making stars at any given time. Typically, a galaxy has an initial constant high SFR then at a time called t quench (tq) it’s SFR decreases exponentially which is characterised by a number called tau. Small tau means the galaxy stops forming stars, or is quenched, more rapidly. So overall for each galaxy we need to determine two numbers tq and tau to figure out how it evolved. Figure 1 shows what this model looks like.
Figure 1: Model of a single galaxy’s SFR over time. Showing an initial high constant SFR, follow by a exponential quench at tq.
To calculate these two numbers, tq and tau, we look at the colour of the galaxy, specifically the UVJ colour I mentioned in my last post. We then compare this to a predicted colour of a galaxy for a specific value of tq and tau. The problem is that there are many different combinations of tq and tau, how to we find the best match for a galaxy? We use a MCMC simulation to do this.
The first MC – Markov-Chain – just means an efficient random walk. We send “walkers” to have a look around for a good tq and tau, but the direction we send them to walk at each step depends on how good the tq and tau they are currently at is. The upshot of this is we quickly home in on a good value of tq and tau. The second MC – Monte Carlo – just picks out random values of tq and tau and tests how good they are by comparing the UVJ colours and our SFR model. Figure 2 shows a gif of a MCMC simulation of a single galaxy. The histograms shows the positions of the walkers searching the tq and tau space, and the blue crosshair shows the best fit value of tq and tau at every step. You can see the walkers homing in and settling down on the best value of tq and tau. I ran this simulation by running a modified version of the starpy code.
Figure 2: MCMC simulation for a single galaxy, pictured in the top right corner. Main plot shows density of walkers. Marginal histograms show 1D projections of walker densities. Blue crosshair shows best fit values of tq and tau at each step.
The maths that underpins this simulation is called Bayesian Statistics, and it’s quite a novel way of thinking about parameters and data. The main difference is that instead of treating unknown parameters as fixed quantities with associated error, they are treated as random variables described by probability distributions. It’s quite a powerful way of looking at the Universe! I’ve left all of the gory maths detail about MCMC out but if you’re interested an article by a DPhil student here at Oxford does are really good job of explaining it here.
So how does this all relate to galaxy morphology, and Galaxy Zoo classifications? I’m currently running the MCMC simulation showing in Figure 2 over the all the galaxies in the COSMOS survey. This is really cool because apart from getting to play with the University of Oxford’s super computer (544 cores!), I can use galaxy zoo morphology to see if the SFR of a galaxy over time is dependent on the galaxy’s shape, and overall learn what the vast amount of data I have says about galaxy evolution.
Good news everyone, another Galaxy Zoo paper was published today! This work was led by yours truly (Hi!) and looks at the impact that the central active black holes (active galactic nuclei; AGN) can have on the shape and star formation of their galaxy. It’s available here on astro-ph: http://arxiv.org/abs/1609.00023 and will soon be published in MNRAS.
Turns out, despite the fact that these supermassive black holes are TINY in comparison to their galaxy (300 light years across as opposed to 100,000 light years!) we see that within a population of these AGN galaxies the star formation rates have been recently and rapidly decreased. In a control sample of galaxies that don’t currently have an AGN in their centre, we don’t see the same thing happening. This phenomenon has been seen before in individual galaxies and predicted by simulations but this is the first time its been statistically shown to be happening within a large population. It’s tempting to say then that it’s the AGN that is directly causing this drop in the star formation rate (maybe because the energy thrown out by the active black hole blasts out or heats the gas needed to fuel star formation) but with the data we have we can’t say for definite if the AGN are the cause. It could be that this drop in star formation is being caused by another means entirely, which also coincidentally turns on an AGN in a galaxy.
These galaxies were also all classified by our wonderful volunteers in Galaxy Zoo 2 which meant that we could also look whether this drop in the star formation rate was dependent on the morphology of the galaxy; turns out not so much! If the drop in the star formation rate is being caused directly by the AGN (and remember we still can’t say for sure!) then the central black hole of a galaxy doesn’t care what shape galaxy it’s in. An AGN will affect all galaxies, regardless of morphology, just the same.
Last year we had so much fun celebrating all that we (including you) had accomplished over the first 8 years of Galaxy Zoo. This year, for our 9th birthday, we thought we’d hand things directly over to you. We sent out a newsletter asking people about their favorite Galaxy Zoo science. We asked people to rank five choices:
- Hanny’s Voorwerp & the Voorwerpjes (ionized clouds and active galaxies)
- Green Peas (highly compact & star-forming galaxies)
- Red spirals (disk galaxies with no/little star formation)
- Blue ellipticals (spheroid galaxies with ongoing/retriggered star formation)
- Bars (the galaxy kind; how this mode of disk galaxies drives galaxy evolution)
We’ve now collected just over 200 responses and combined your rankings. Although the distributions were pretty similar, and all the options had plenty of people choosing it as their favorite, one of the options jumped out as a pretty clear leader (at least in this rather informal poll).
Of course, the list we asked people to choose from is by no means complete, especially if you include not just the main Galaxy Zoo but also its related projects. In the “Other” box we had a variety of entries, with some mentioning galaxies found in Radio Galaxy Zoo and others citing those seen in Galaxy Zoo: Bar Lengths. Plenty of people mentioned galaxy mergers, and gravitational lenses got a few mentions too! If we had a complete list the rankings would likely be different, but then again, that would be such a long list I’d be worried many fewer people would want to answer.
We also had a space for people to enter whatever text they wanted at the end of the survey, and the responses were varied, interesting, and a treat to read. Here’s a sample (each paragraph is a separate comment):
I do not spend a lot of time here, but when I have the time, I love it. Thank you!
What a great way to feel like a scientist.
I’ve been an on-and-off participant in the Zooniverse citizen science projects since I was 13 years old – and Galaxy Zoo has been one of my favourites for a while! I just wanted to say thank you for providing the opportunity for an ordinary teenager to feel included in fascinating scientific research – that experience has inspired me to pursue a degree in Physics and Astronomy in the fall.
We were also curious about who, as a group, we were asking these questions of. It turns out that quite a large fraction of people who responded to the survey have been with us since the early days, which is so lovely. And we were also delighted to see people engaging with us who’ve just recently discovered Galaxy Zoo. We are so glad all of you are collaborating with us; here’s to many years to come.
P.S. – The big 10 is coming next year… what would you like to see for the occasion?
Hello present, and hopefully future volunteers!
I’m a summer research intern on the Zooniverse Project, based at the University of Oxford. I’m currently at university in London and I’ll be going into my fourth year of studying Theoretical Physics. I’m three weeks into my internship, and I want to share with you how the hundreds-of-thousands of galaxies you’ve worked hard to classify are being used in research.
I’m working with Galaxy Zoo Hubble (GZH) data, which are classifications of galaxies from the Hubble Space Telescope Legacy survey. The classifications for this data have just been submitted for publication by a group of researchers from Galaxy Zoo, and you can read about it here. Specifically I’m working with a subset of this data from the Cosmic Evolution Survey, or COSMOS. This survey is specially designed to help us understand how galaxies evolve over time, and how their local environments in the universe affect this.
Up to now I’ve been using GZH data to add morphology to data currently found in the literature, in the hope that we can learn something new about galaxy evolution. In this post I want to share with you a particular striking example of how GZH classifications have transformed current data. Figure 1 shows two rows of colour-colour plots. The vertical axis is U-V colour, which is a measure how much recent star formation is going on in a galaxy – the higher up a galaxy is in the plot the more recent star formation is going on. The horizontal axis is V-J colour which is a measure of how much Infrared light compared to visible light a galaxy is emitting – the further left a galaxy is in the plot the generally older and more ‘dead’ it is. The first row (top) is found in a paper (Muzzin et al 2013), on analysis of galaxies in the COSMOS survey, written by researchers from the US, Denmark, Netherlands, UK, and Chile. The second row (bottom) shows the same data but with GZH classifications overlaid. Red and blue points represent featured and smooth galaxies respectively. Banner image shows a featured spiral galaxy (left), and and smooth elliptical galaxy (right).
Figure 1: colour-colour plots Galaxies from the COSMOS survey (top) before (bottom) after GZH classifications data added. Red and blue points represent featured and smooth galaxies respectively.
No need to ask which one looks more interesting! Lets understand what these plots mean. Each point on each plot represents a different galaxy. On each row the plots are sorted by z or redshift; you can think of this as being different snapshots of galaxies in the universe at different times. The most recent snapshot being on the left, and the oldest on the right of each row.
The important thing to take away from this data is that there are two distinct blobs or populations of galaxies in each plot. Galaxies in the top left blob are called star forming (SF) and galaxies in the longer bottom right blob are non-star forming, or ‘quiescent’. From the overlay of GZH classifications data on Figure 1 (bottom), we can see that the nearly complete absence of galaxies with features in the top left population of SF galaxies – something that we didn’t know before!
So why do we care about analysing colour-colour plots of galaxies? As a galaxy evolves through its lifetime it moves from the SF population to the quiescent through that bit in-between the two blobs, which is called the ‘Green Valley’ (I’ll save more on that for another blog post), and the truth is nobody quite knows how this happens. Overall, we hope GZH classifications may shed some light on this, and help us understand how galaxies evolve.
To help us finally understand the evolution of galaxies, get involved right now at www.galaxyzoo.org, we’d be happy to have you on-board!
I’m happy to announce that my first Galaxy Zoo paper has been published! You may be aware that I recently posted about spiral galaxies in Galaxy Zoo (the blog post can be found here). The first results from these studies have now made it to publication, where we discuss a new method for removing bias in galaxy classifications, as well as comparing the properties of different spiral galaxies.
As discussed in my earlier blog post, spiral galaxies are some of the most interesting galaxies in the local Universe. However, studies of these objects have been limited, due to the fact that galaxies need to be visually classified. It is therefore thanks to all of the volunteers in Galaxy Zoo that a paper like this can be published, where we have thousands of spiral galaxies to compare. Thanks to these classifications, we have been able to find some interesting preliminary results: we find that many-armed spiral galaxies are bluer in colour than two-armed spiral galaxies. This suggests that many-armed spirals are sites of significantly enhanced activity in the Universe, where high levels of star formation activity are taking place.
However, these results are only a hint at what we can achieve in the future. So watch this space and we’ll keep you all informed about any developments in our work on spiral galaxies!
If you are interested in reading more, the full article can be found here.
This post was written as a contribution by Timothy Friel, an undergraduate Australian National University student studying Theoretical Physics and Science Communication. Tim is conducting research into citizen science projects and their social media communication strategies.
Hats off to two of our volunteer participants who have officially been written in the stars.
The Matorny-Terentev Cluster RGZ-CL J0823.2+0333 bears the name of the two citizen scientists who pieced together its structure.
Ivan Terentev and Tim Matorny, two Radio Galaxy Zoo participants from Russia, discovered that a particular radio-source had a line of radio blobs delineating a C-shaped ‘Wide-Angle Tail galaxy’ (WAT). The massive galaxy hosting the super-massive black hole and its associated jets are moving through intergalactic gas, causing the jets to fold back, similar to the way a sky-diver’s hair is shaped by the wind.
Figure 1: The new discovery: The C-shaped “wide angle tail galaxy” (pink) surrounded by the galaxies of the Matorny-Terentev cluster (white). Julie Banfield, Author provided
This discovery has been published this week in the prestigious scientific journal Monthly Notices of the Royal Astronomical Society, with the paper “Radio Galaxy Zoo: discovery of a poor cluster through a giant wide-angle tail radio galaxy” (accessible for free via bit.ly/RGZpaperWAT).
Lead author of the study, Dr Julie Banfield of CAASTRO at The Australian National University (ANU), said that the discovery surprised the astronomers running the program.
“They found something that none of us had even thought would be possible”, said Dr Banfield.
More details of the research team’s response and the next steps for the project can be read in the press release published by CAASTRO (bit.ly/PR14June16).
A huge congratulations must go to the two citizen scientists, Ivan and Tim, for their efforts to work collaboratively to make this discovery. It is great to witness that physical and language barriers have been unable to halt amazing scientific endeavours.
A further thank you must also be noted for the Radio Galaxy Zoo team, in particular the joint project leaders Dr Julie Banfield (ANU) and Dr Ivy Wong (ICRAR at UWA), alongside Dr Anna Kapinska (ICRAR at UWA), Dr Ray Norris (CSIRO/WSU) and all other members of the international project. The team’s continued energy to motivate volunteer participants to develop their own research projects has uncovered the immense potential of citizen science as both a research tool and a method of bringing people together across the globe.
Finally, the Radio Galaxy Zoo team would like to thank the 10,000 volunteers globally who have volunteered to conduct over 1.6 million image classifications over the past two and a half years. The dedication of volunteers to this project has bred a supportive community which has now completed almost 60% of the dataset, a feat unable to be achieved by any single individual.
If you would love to become involved in this international astronomical community, please head to bit.ly/RadioGalaxyZoo1 and begin your journey to uncover the depths of our universe and its wonders, all from the comfort of your own home.
ANU: Australian National University
CAASTRO: Australian Research Council Centre of Excellence for All-Sky Astrophysics
CSIRO: Commonwealth Scientific and Industrial Research Organisation
ICRAR: International Centre for Radio Astronomy Research
UWA: University of Western Australia
Good news! Early this morning UK time, we submitted the paper describing the finished data release for the third iteration of Galaxy Zoo to the journal Monthly Notices of the Royal Astronomical Society. It’s taken an enormous amount of work to get to this point, in particular in understanding how to account for the effects of distance on classifications. Most of that work was done by Kyle Willett and Mel Galloway from the University of Minnesota (Kyle gave a sneak preview here), and it was finished just in time because Kyle leaves us tomorrow.
Kyle has had an enormous impact on Galaxy Zoo since he came on board in 2011. As well as publishing papers on star formation and the enormous data release paper for Galaxy Zoo 2, he’s been the person making images, coordinating what’s seen on the site and keeping an eye on classifications as they’ve come in. Just as importantly, he’s been a prolific contributor to this blog, playing a leading role in keeping our important collaborators, the volunteers, in touch with what’s going on. It’s not just Galaxy Zoo, either, as Kyle has also played a critical role in the Radio Galaxy Zoo team, and has made major contributions to their recent papers too. He will be much missed by all of us, though we wish him well with his future endeavours.
It’s been a good amount of time since the Galaxy Zoo: Hubble and Galaxy Zoo: CANDELS projects were finished, tackling more than 200,000 combined galaxies thanks to the efforts of our volunteers. While we’ve had a couple of science papers based on the early results (Melvin et al. 2014, Simmons et al. 2014, Cheung et al. 2015), a full release of the data and catalog has taken slightly longer. However, we’ve been working hard, testing the data, and developing some new analysis methods on both image sets. This month has been really exciting, and we now have drafts for both papers that are just about finished. Once they’ve been accepted to the journals (and revised, if necessary), we’ll have some much longer posts discussing the results, and of course attaching the papers themselves. Hopefully that’ll be quite soon.
As a small teaser, here’s a little movie I just made of the Galaxy Zoo: Hubble paper as it went through the various drafts by different members of the science team. If only all paper writing were this easy …😉