We submitted the Galaxy Zoo CANDELS paper in May. Now, after some discussion with a very helpful referee, the paper is accepted! I hope our volunteers are as thrilled as I was to get the news. It happened within days of the Galaxy Zoo: Hubble paper acceptance. Hurray!
If you’d like to read the paper, it’s publicly available as a pre-print now and will be published at some point soon in the Monthly Notices of the Royal Astronomical Society. The pre-print version is the accepted version, so it should only differ from the eventual published paper by a tiny bit (I’m sure the proof editor will catch some typos and so on).
The paper may be a little long for a casual read, so here’s an overview:
- We collected 2,149,206 classifications of 52,073 subjects, from 41,552 registered volunteers and 53,714 web browser sessions where the classifier didn’t log in. In the analysis we assumed each of those unique browser sessions was a separate volunteer.
- The raw consensus classifications are definitely useful, but we also weighted the classifications using a combination of “gold standard” data and consensus-based weighting. That is, classifiers were up- or down-weighted according to whether they could tell a galaxy apart from a star most of the time, and then the rest of the weighting proceeded in the same way it has for every other GZ dataset. No surprise: the majority of volunteers are excellent classifiers.
- 6% of the raw classifications were from 86 classifiers who both classified a lot and gave the same answer (usually “star or artifact”) at least 98% of the time, no matter what images they saw. We have some bots, but they’re quite easy to spot.
- Even with a pretty generous definition of what counts as “featured”, less than 15% of galaxies in the relatively young Universe that this data examines have clear signs of features. Most galaxies in the data set are relatively smooth and featureless.
- Galaxy Zoo compares well with visual classifications of the same galaxies done by members of the CANDELS team, despite the fact that the comparison is sometimes hard because the questions they asked weren’t the same as what we did. This is, of course, a classic problem when comparing data sets of any kind: to some extent it’s always apples-vs-oranges, and the devil is in the details.
- By combining Galaxy Zoo classifications with multi-wavelength light profile fitting — where we fit a 2D equation to the distribution of light in a galaxy, the properties of which correlate pretty well with whether a galaxy has a strong disk component — we’ve identified a population of likely disk-dominated galaxies that also completely lack the features that are common in disk galaxies in the nearby, more evolved Universe. These disks don’t have spiral arms, they don’t have bars, they don’t have clumps. They’re smooth, but they are disks, not ellipticals. They tend to be a bit more compact than disk galaxies that do have features, even though they’re at the same luminosities. They’re also hard to identify using color alone (which echoes what we’ve seen in past Galaxy Zoo studies of various different kinds of galaxies). You really need both kinds of morphological information to reliably find these.
- The data is available for download for those who would like to study it: data.galaxyzoo.org.
With the data releases of Galaxy Zoo: Hubble and Galaxy Zoo CANDELS added to the existing Galaxy Zoo releases, your combined classifications of over a million galaxies near and far are now public. We’ve already done some science together with these classifications, but there’s so much more to do. Thanks again for enabling us to learn about the Universe. This wouldn’t have been possible without you.
I’m incredibly happy to report that the main paper for the Galaxy Zoo: Hubble project has just been accepted to the Monthly Notices of the Royal Astronomical Society! It’s been a long road for the project, but we’ve finally reached a major milestone. It’s due to the efforts of many, including the scientists who designed the interface and processed the initial images, the web developers who managed our technology and databases, more than 80,000 volunteers who spent time classifying galaxies and discussing them on the message boards, and the distributed GZ science team who have been steadily working on analyzing images, calibrating data, and writing the paper.
The preprint for the Galaxy Zoo: Hubble paper is available here. The release of GZH also syncs up with the publication of the Galaxy Zoo: CANDELS catalog, led by Brooke Simmons; she’ll have a blog post up later today, and the GZC paper is also available as a preprint.
Galaxy Zoo: Hubble began in 2010; it was the first work of GZ to move beyond the images taken with the Sloan Digital Sky Survey (SDSS). We were motivated by the need to study the evolution and formation of galaxies billions of years ago, in the early days of the Universe. While SDSS is an amazing telescope, it doesn’t have the sensitivity or resolution to make a quality image of a typical galaxy beyond a redshift of about z=0.4 (distances of a few billion parsecs). Instead, we used images from the Hubble Space Telescope, the flagship and workhorse telescope of NASA for the past two decades, and asked volunteers to help us classify the shapes of galaxies in several of Hubble’s largest and deepest surveys. After more than two years of work, the initial set of GZH classifications were finished in 2012 and the site moved on to other datasets, including CANDELS, UKIDSS, and Illustris.
So why has it taken several years to finish the analysis and publication of the data? The reduction of the GZH data ended up being more complicated and difficult than we’d originally anticipated. One key difference lies in our approach to a technique we call debiasing; these refer to sets of corrections made to the raw data supplied by the volunteers. There’s a known effect where galaxies that are less bright and/or further away will appear dimmer and/or smaller in the images which are being classified. This skews the data, making it appear that there are more elliptical/smooth galaxies than truly exist in the Universe. With SDSS images, we dealt with this by assuming that the nearest galaxies were reliably measured, and then deriving corrections which we applied to the rest of the sample.
In Galaxy Zoo: Hubble, we didn’t have that option available. The problem is that there are two separate effects in the data that affect morphological classification. The first is the debiasing issue just mentioned above; however, there’s also a genuine change in the populations of galaxies between, say, 6 billion years ago and the present day. Galaxies in the earlier epochs of the Universe were more likely to have clumpy substructures and less likely to have very well-settled spiral disks with features like bars. So if we just tried to correct for the debiasing effect based on local galaxies, we would have explicitly removed any of the real changes in the population over cosmic time. Since those trends are exactly what we want to study, we needed another approach.
Our solution ended up bringing in another set of data to serve as the calibration. Volunteers who have classified on the current version of the site may remember classifying the “FERENGI” sample. These were images of real galaxies that we processed with computer codes to make them look like they were at a variety of distances. The classifications for these images, which were completed in late 2013, gave us the solution to the first effect; we were able to model the relationship between distance to the galaxy and the likelihood of detecting features, and then applied a correction based on that relationship to the real GZH data.
The new GZH data is similar in format and structure to the data release from GZ2. The main product is a very large data table (113,705 rows by 172 columns) that researchers can slice and dice to study specific groups of galaxies with morphological measurements. We’re also releasing data from several related image sets, including experiments on fading and swapping colors in images, the effect of bright active galactic nuclei (AGN), different exposure depths, and even a low-redshift set of SDSS Stripe 82 galaxies classified with the new decision tree. All of the data will be published in electronic tables along with the paper, and are also downloadable from data.galaxyzoo.org. Our reduction and analysis code is available as a public Github repository.
The science team has already published two papers based on preliminary Galaxy Zoo: Hubble data. This included a paper led by Edmond Cheung (UCSC/Kavli IPMU) that concluded that there is no evidence connecting galactic bars and AGN over a range of redshifts out to z = 1.0. Tom Melvin (U. Portsmouth) carefully examined the overall bar fraction in disks using COSMOS data, measuring a strong decrease in bar fraction going back to galaxies 7.8 billion years ago. We’re now excited to continue new research areas, including a project led by Melanie Galloway (U. Minnesota) on the evolution of red disk galaxies over cosmic time. We hope GZH will enable a lot more science very soon from both our team and external researchers, now that the data are publicly released.
A massive “thank you” again to everyone who’s helped with this project. Galaxy Zoo has made some amazing discoveries with your help in the past eight years, and now that two new unique sets of data are openly available, we’re looking forward to many more.
The Universe is pretty huge, and to understand it we need to collect vast amounts of data. The Hubble Telescope is just one of many telescopes collecting data from the Universe. Hubble alone produces 17.5 GB of raw science data each week. That means since its launch to low earth orbit in April 1990, it’s collected roughly a block of data equivalent in size to 6 million mp3 songs! With the launch of NASA’s James Webb Telescope just around the corner – (a tennis court sized space telescope!), the amount of raw data we can collect from the Universe is going to escalate dramatically. In order to decipher what this data is telling us about the Universe we need to use sophisticated statistical techniques. In this post I want to talk a bit about a particular technique I’ve been using called a Markov-Chain-Monte-Carlo (MCMC) simulation to learn about galaxy evolution.
Before we dive in into the statistics let me try and explain what I’m trying to figure out. We can model galaxy evolution by looking at a galaxy’s star formation rate (SFR) over time. Basically we want know to how fast a particular galaxy is making stars at any given time. Typically, a galaxy has an initial constant high SFR then at a time called t quench (tq) it’s SFR decreases exponentially which is characterised by a number called tau. Small tau means the galaxy stops forming stars, or is quenched, more rapidly. So overall for each galaxy we need to determine two numbers tq and tau to figure out how it evolved. Figure 1 shows what this model looks like.
Figure 1: Model of a single galaxy’s SFR over time. Showing an initial high constant SFR, follow by a exponential quench at tq.
To calculate these two numbers, tq and tau, we look at the colour of the galaxy, specifically the UVJ colour I mentioned in my last post. We then compare this to a predicted colour of a galaxy for a specific value of tq and tau. The problem is that there are many different combinations of tq and tau, how to we find the best match for a galaxy? We use a MCMC simulation to do this.
The first MC – Markov-Chain – just means an efficient random walk. We send “walkers” to have a look around for a good tq and tau, but the direction we send them to walk at each step depends on how good the tq and tau they are currently at is. The upshot of this is we quickly home in on a good value of tq and tau. The second MC – Monte Carlo – just picks out random values of tq and tau and tests how good they are by comparing the UVJ colours and our SFR model. Figure 2 shows a gif of a MCMC simulation of a single galaxy. The histograms shows the positions of the walkers searching the tq and tau space, and the blue crosshair shows the best fit value of tq and tau at every step. You can see the walkers homing in and settling down on the best value of tq and tau. I ran this simulation by running a modified version of the starpy code.
Figure 2: MCMC simulation for a single galaxy, pictured in the top right corner. Main plot shows density of walkers. Marginal histograms show 1D projections of walker densities. Blue crosshair shows best fit values of tq and tau at each step.
The maths that underpins this simulation is called Bayesian Statistics, and it’s quite a novel way of thinking about parameters and data. The main difference is that instead of treating unknown parameters as fixed quantities with associated error, they are treated as random variables described by probability distributions. It’s quite a powerful way of looking at the Universe! I’ve left all of the gory maths detail about MCMC out but if you’re interested an article by a DPhil student here at Oxford does are really good job of explaining it here.
So how does this all relate to galaxy morphology, and Galaxy Zoo classifications? I’m currently running the MCMC simulation showing in Figure 2 over the all the galaxies in the COSMOS survey. This is really cool because apart from getting to play with the University of Oxford’s super computer (544 cores!), I can use galaxy zoo morphology to see if the SFR of a galaxy over time is dependent on the galaxy’s shape, and overall learn what the vast amount of data I have says about galaxy evolution.
It’s been a good amount of time since the Galaxy Zoo: Hubble and Galaxy Zoo: CANDELS projects were finished, tackling more than 200,000 combined galaxies thanks to the efforts of our volunteers. While we’ve had a couple of science papers based on the early results (Melvin et al. 2014, Simmons et al. 2014, Cheung et al. 2015), a full release of the data and catalog has taken slightly longer. However, we’ve been working hard, testing the data, and developing some new analysis methods on both image sets. This month has been really exciting, and we now have drafts for both papers that are just about finished. Once they’ve been accepted to the journals (and revised, if necessary), we’ll have some much longer posts discussing the results, and of course attaching the papers themselves. Hopefully that’ll be quite soon.
As a small teaser, here’s a little movie I just made of the Galaxy Zoo: Hubble paper as it went through the various drafts by different members of the science team. If only all paper writing were this easy …😉
Once upon a time, there was an experimental project called Galaxy Zoo: Mergers. It used ancient, mystical technology to allow volunteers to run simulations of merging galaxies on their computers, and to compare the results of many such simulations. Their mission: to find matches to more than fifty nearby mergers selected from Galaxy Zoo data.
Amongst the chosen galaxies were not just run-of-the-mill, everyday mergers, but also the various oddities that the volunteers found, such as the Penguin galaxy. The team led volunteers through a series of tournaments designed to pit potential solutions for a particular galaxy against each other. In total, more than 3 million simulations were reviewed producing the results described in the paper, now accepted by the journal MNRAS, and in the dataset visible at the main Galaxy Zoo data repository. This represents a huge amount of effort, and a speeding up of the process – in the paper, we note that previous fits to mergers have taken months of effort to complete.
Which is not to say the analysis, led by Anthony Holincheck and John Wallin, has been easy. In a recent email to the Galaxy Zoo team, John commented:
This is by far the most complex project I have ever worked on. Most papers that model interacting galaxies contain one or two systems where the author uses a few dozen simulations. We just published a paper that modeled 62 different systems using a brand new modeling technique where the 3 million simulation results were reviewed by citizen scientists. Best of all, the 62 models were done using the same code and the same coordinate system so others can reproduce them. Doing this with other published simulations is nearly impossible.
I know an immense amount of effort went into making sure that the results weren’t wasted, and the paper thus represents a happy ending to a tale that’s been running a long time. But it is not really an end; we are already planning to observe some of these galaxies as part of surveys like MaNGA that can measure the way that the galaxies’ components are moving today, allowing us to test these models. We also hope a library of models might be useful for other astronomers, and will be looking to try and revive this kind of project.
Read more about Galaxy Zoo: Mergers in this old blog post blog.galaxyzoo.org/2012/03/27/the-finale-of-merger-zoo.
This post was written as a contribution by Timothy Friel, an undergraduate Australian National University student studying Theoretical Physics and Science Communication. Tim is conducting research into citizen science projects and their social media communication strategies.
Meet two of our fantastic Zooniverse members who have been recognised as co-authors for a RGZ submitted paper.
In March 2016, the Radio Galaxy Zoo (RGZ) team submitted a paper which is co-authored by two of our SuperRGZooites. Thanks to the help of citizens around the world, over 1.6 million classifications have been made. However, a very special thanks must go to two citizens who have been greatly involved in our most recent submitted paper.
Meet Ivan Terentev and Tim Matorny, our Citizen Science co-authors.
How did you discover Radio Galaxy Zoo and become involved?
Tim: I had a passion for research and to be involved with generating new knowledge. So I began to look and met [the world of] citizen science and tried many different projects. I was already familiar with the Zooniverse, when I got email about new project – RGZ.
Ivan: I became involved in RGZ from its beginning, more or less, in December 2013, and at that time I was part of the Zooniverse for two years. I was mostly contributing to the Planet Hunters project back then, but occasionally I switched to different projects just to look for what they have to offer. And it was during one of these “Let’s try something different” moments that I discovered RGZ through the announcement post in the Galaxy Zoo blog.
What parts kept you interested and motivated to stay a part of this project?
Tim: The team of scientists and their active participation is an important part. Their blog posts, comments and links have helped me to learn about the project and my involvement with the goals.
Looking for host radio lobes which are separated by a 10′ [minutes] or looking at the behaviour of jets in galaxies clusters is really exciting for me. I like that RGZ covers a wide range of data: radio, optics, IR, X-ray.
Ivan: If we are talking specifically about RGZ, it would be the RGZ Talk community and the fact that RGZ Science team is eager to communicate with simple volunteers and involve them in the research process. But a large portion of my motivation [for RGZ] is the same as for the rest of the Zooniverse projects. You see, I am sci-fi fan and it made me interested in space exploration. I like to watch documentaries about the astronomers, their work and all the amazing stuff in the universe around us and through the Zooniverse I can actually be involved in the process of science and help to shape the future, even if it just by a very tiny fraction. I never thought that something like this would be possible before I discovered Zooniverse.
How do you feel about being a co-author of a scientific research paper?
Tim: I am still amazed and feel more motivated to look for stunning new radio galaxies.
Ivan: This isn’t the first time actually, I am also a co-author for three papers from the Planet Hunters, BUT it is always awesome, like every single time! Although, I keep my head cool over that since most of the work was done by the professional scientists. A huge thanks to them for the acknowledgment of my small contribution in the form of inviting me to be a co-author in their paper. With this RGZ paper, I got a chance to see the whole process of science starting from the simple question “What is that?” and then people trying to figure out what is going on, schedule observations, discussing things and I have been a part of it! All the way through the process, ending with the actual published science article. It was an amazing experience!
Without the contributions made by our volunteers all over the world, we would not have been so successful in our endeavours.
However, we have only reached 57% of our classification target. Head to www.bit.ly/RadioGalaxyZoo1 to become involved and you could be co-authoring another great discovery with us!
I’m happy to report that in the last several days, we’ve simultaneously finished the initial sets of galaxies from both the DECaLS survey and the second subset of simulated galaxies from Illustris. This has meant the completion (since last September) of more than 50,000 galaxies seen 40 times apiece, for more than 2 MILLION classifications.
So far, your work is helping reveal new insights based on this deeper data. One very preliminary result: as we’d predicted, the better conditions in DECaLS (bigger camera, better night sky seeing, larger telescope mirror) are revealing galaxies that were classified in SDSS as smooth, but in fact with faint or extended disks and features that are now visible. This is really exciting, and is helping to modify our ideas of the assembly histories of these galaxies.
The Galaxy Zoo site is still active – we’ve reactivated a few of the DECaLS DR1 galaxies to slightly improve our statistics, but shortly we’re going to add new sets of (real images) to continue the next phase. I’ll post more as soon as we’ve finalized our plans.
As always, our sincere thanks! Time to start our analysis and continue the science…
Following on from the excellent summary of the hi-lights in 2015 for the Radio Galaxy Zoo project, here’s a similar post about results from Galaxy Zoo.
This year we collected 4,755,448 classifications on 209,291 different images of galaxies. You continue to amaze us with your collective efforts. Thank you so much for each and everyone of of these classifications.
The year started with Galaxy Zoo scientists at Mauna Kea observing galaxies, and reported in this wonderful series of blog posts by (former) Zooniverse developer Ed Paget.
We celebrated 8 years of Galaxy Zoo back in July, with this blog series of all things 8-like about Galaxy Zoo.
Back in May we finished collecting classifications on the last of our Hubble Space Telescope images. At the AAS in Florida this week, Kyle Willett and Brooke Simmons presented posters on the planned data releases for the classifications.
We both launched and finished classifying the first set of images of simulated galaxies from the Illustris Simulation (read more here: New Images for Galaxy Zoo: Illustris and here: Finished with First Set of Illustris Images). We also launched our first set of images from the DECaLS survey, which is using the Dark Energy Camera (New Images for Galaxy Zoo: DECaLS)
We also launched a new Galaxy Zoo side project – Galaxy Zoo Bars (one of the first projects built on the new Zooniverse Project Builder software), measuring bar lengths of galaxies in the distant Universe. The entire set were measured in less than a year, so thank you to any of you who contributed to that, and if you missed it don’t worry, we have plans for more special projects this year.
We launched a new web interface to explore the Galaxy Zoo classifications.
Our contributions to the peer reviewed astronomical literature continue. Papers number 45-48 from the team were officially published in 2015. They were:
– Galaxy Zoo: the effect of bar-driven fueling on the presence of an active galactic nucleus in disc galaxies, Galloway+ 2015.
– Galaxy Zoo: Evidence for Diverse Star Formation Histories through the Green Valley, Smethurst+ 2015.
– Galaxy Zoo: the dependence of the star formation-stellar mass relation on spiral disc morphology, Willett+ 2015.
You can access all 48 team papers using your classifications at the Zooniverse Publication Page. Remember that all Zooniverse papers published in the Monthly Notices of the Royal Astronomical Society – which includes most of the Galaxy Zoo papers – are available open access to any reader, and if we happen to publish elsewhere we always make the post-acceptance version available on the arxiv.org.
All of our papers include a version of this acknowledgement to our classifiers: “The data in this paper are the result of the efforts of the Galaxy Zoo volunteers, without whom none of this work would be possible. Their efforts are individually acknowledged at authors.galaxyzoo.org.” We all hope you all know how grateful we are for each and every one of your classifications.
This year saw publication of the first paper on Hubble observations of Voorwerpje systems accompanied by an HST press release.
One of those papers from (mostly) outside the GZ team discussed a rare examples of double radio sources from spiral hosts, something Radio Galaxy Zoo will find many more of: “J1649+2635: a grand-design spiral with a large double-lobed radio source”, Mao et al. 2015.
Another exciting thing about this year has been the number of papers from non team members using the classifications which are now public (see data.galaxyzoo.org). To date almost 300 astronomical papers have been written which cite the original description of Galaxy Zoo (Lintott et al. 2008) and the two data release papers so far (Lintott et al. 2011 for GZ1 and Willett et al. 2013 for GZ2) have 164 and 34 citations respectively. The number of papers in the Astrophysics Data System which contain the words “Galaxy Zoo” (which you can search in ADS Labs) is an astonishing 700 (409 for refereed publications).
These are just some of the high-lights I’ve pulled together. If I’ve missed your favourite feel free to add it in the comments below. All in all it’s been a great year. Here’s to an equally good 2016!
Our first paper “Radio Galaxy Zoo: host galaxies and radio morphologies derived from visual inspection” was published in Monthly Notices of the Royal Astronomical Society (MNRAS) in September;
upon the recommendation of our referee, our paper on hybrid morphology radio sources will be split into two papers; and
the giant wide angle tail (WAT) discovery paper will be available soon.
progress on the giant WAT is continuing to bring up more interesting information including our JVLA data – potentially 3 additional papers;
we obtained 4 hours to obtain a spectrum for four of our green DRAGN with the observations scheduled for March 2016; and
- with all your work, RGZ has discovered over 100 new giant radio galaxies!
matching of RGZ classifications to SDSS;
merging Galaxy Zoo data with Radio Galaxy Zoo data;
our observations with the JVLA on the hybrid radio sample is complete with 60 hours of observing time; and
- we are working with the International Astronomical Union (IAU) to get the RGZ name official.
Martin Hardcastle (Hertfordshire)
Sarah White (ICRAR/Curtin)
Francesco de Gasperin (Leiden)
Many bargains must be made in pursuit of an academic career, and chief among them is an openness to a nomadic early-career life in exchange for a better chance at staying permanently put somewhere later. Grad students and postdocs move around. Not only do we travel all over the world sharing and discussing our research, but the relatively short duration of postdocs, and the fact that in astronomy doing at least 2 of them is now the norm, means we regularly pull up roots and dash off to live somewhere else. My friends have collectively done postdocs on all continents, including Antarctica. Including places thousands of miles from friends and family; including places where they can neither read nor speak any of the native languages.
In this context, I am so, so lucky. My first postdoc moved me only a medium distance (across just one ocean), and to a place where I could at least understand the words, even if I didn’t always get every nuance of meaning. At Oxford I made lifelong friends and built great collaborations, and I thought the research itself was pretty good, too.
Turns out NASA agrees with me. Last year I applied for and was awarded an Einstein Fellowship, which is an early-career award lasting 3 years, an independent postdoc that can be taken to any institution in the US. They’re very competitive (I had applied the previous year without success), and I was thrilled to be awarded one at my top-choice host institution. My first day was last week.
Here’s what the 2015 Fellows page has to say about my research plans:
Brooke uses a variety of multi-wavelength data, including highly accurate galaxy morphologies from the Galaxy Zoo project, to research the connection between supermassive black holes and the galaxies that host them. This connection appears to exist over many orders of magnitude in black hole and galaxy mass, but its fundamental origin is still a puzzle. As an Einstein Fellow at the University of California, San Diego, Brooke will investigate supermassive black hole growth in the absence of galaxy mergers, using a rare sample of galaxies which have never had a significant merger yet host growing black holes. These active nuclei, selected because their host galaxies lack the bulges which inevitably result from a galaxy merger, provide powerful leverage to disentangle the complex drivers of black hole growth and determine the origin of observed black hole-galaxy correlations.
During my fellowship I’m planning on moving forward with the research we first published in 2013 investigating bulgeless galaxies with growing black holes. That is: it’s Galaxy Zoo research.
Galaxy Zoo research brought me to Oxford, and now it has brought me to California. UCSD is a great place, and I’ve already made some really excellent scientists. UCSD is also part of the Southern California Center for Galaxy Evolution and has access to some of the world’s best telescopes, so the future is full of potential.
For now, though: I wouldn’t be here, watching sunsets from my office, without your contributions to Galaxy Zoo over the years. Thank you.