Some colleagues and I successfully proposed for a symposium on citizen science at the annual meeting of the American Association for the Advancement of Science (AAAS) in San Jose, CA in February 2015. (The AAAS is one of the world’s largest scientific societies and is the publisher of the Science journal.) Our session will be titled “Citizen Science from the Zooniverse: Cutting-Edge Research with 1 Million Scientists.” It refers to the more than one million volunteers participating in a variety of citizen science projects. This milestone was reached in February, and the Guardian and other news outlets reported on it.
The Zooniverse began with Galaxy Zoo, which recently celebrated its seventh anniversary. Of course, Galaxy Zoo has been very successful, and it led to the development of a variety of citizen science projects coordinated by the Zooniverse in diverse fields such as biology, zoology, climate science, medicine, and astronomy. For example, projects include: Snapshot Serengeti, where people classify different animals caught in millions of camera trap images; Cell Slider, where they classify images of cancerous and ordinary cells and contribute to cancer research; Old Weather, where participants transcribe weather data from log books of Arctic exploration and research ships at sea between 1850 and 1950, thus contributing to climate model projections; and Whale FM, where they categorize the recorded sounds made by killer and pilot whales. And of course, in addition to Galaxy Zoo, there are numerous astronomy-related projects, such as Disk Detective, Planet Hunters, the Milky Way Project, and Space Warps.
We haven’t confirmed all of the speakers for our AAAS session yet, but we plan to have six speakers who will introduce and present results from the Zooniverse, Galaxy Zoo, Snapshot Serengeti, Old Weather, Cell Slider, and Space Warps. I’m sure it will be exciting and we’re all looking forward to it!
I’ve used some statistical tools to analyze the spatial distribution of Galaxy Zoo galaxies and to see whether we find galaxies with particular classifications in more dense environments or less dense ones. By “environment” I’m referring to the kinds of regions that these galaxies tend to be found: for example, galaxies in dense environments are usually strongly clustered in groups and clusters of many galaxies. In particular, I’ve used what we call “marked correlation functions,” which I’ve found are very sensitive statistics for identifying and quantifying trends between objects and their environments. This is also important from the perspective of models, since we think that massive clumps of dark matter are in the same regions as massive galaxy groups.
We’ve mainly used them in two papers, where we analyzed the environmental dependence of morphology and color and where we analyzed the environmental dependence of barred galaxies. These papers have been described a bit in this post andthis post. We’ve also had other Galaxy Zoo papers about similar subjects, especially this paper by Steven Bamford and this one by Kevin Casteels.
What I loved about these projects is that we obtained impressive results that nobody else had seen before, and it’s all thanks to the many many classifications that the citizen scientists have contributed. These statistics are useful only when one has large catalogs, and that’s exactly what we had in Galaxy Zoo 1 and 2. We have catalogs with visual classifications and type likelihoods that are ten times as large as ones other astronomers have used.
What are these “marked correlation functions”, you ask? Traditional correlation functions tell us about how objects are clustered relative to random clustering, and we usually write this as 1+ ξ. But we have lots of information about these galaxies, more than just their spatial positions. So we can weight the galaxies by a particular property, such as the elliptical galaxy likelihood, and then measure the clustering signal. We usually write this as 1+W. Then the ratio of (1+W)/(1+ξ), which is the marked correlation function M(r), tells us whether galaxies with high values of the weight are more dense or less dense environments on average. And if 1+W=1+ξ, or in other words M=1, then the weight is not correlated with the environment at all.
First, I’ll show you one of our main results from that paper using Galaxy Zoo 1 data. The upper panel shows the clustering of galaxies in the sample we selected, and it’s a function of projected galaxy separation (rp). This is something other people have measured before, and we already knew that galaxies are clustered more than random clustering. But then we weighted the galaxies by the GZ elliptical likelihood (based on the fraction of classifiers identifying the galaxies as ellipticals) and then took the (1+W)/(1+ξ) ratio, which is M(rp), and that’s shown by the red squares in the lower panel. When we use the spiral likelihoods, the blue squares are the result. This means that elliptical galaxies tend to be found in dense environments, since they have a M(rp) ratio that’s greater than 1, and spiral galaxies are in less dense environments than average. When I first ran these measurements, I expected kind of noisy results, but the measurements are very precise and they far exceeded my expectations. Without many visual classifications of every galaxy, this wouldn’t be possible.
Second, using Galaxy Zoo 2 data, we measured the clustering of disc galaxies, and that’s shown in the upper panel of the plot above. Then we weighted the galaxies by their bar likelihoods (based on the fractions of people who classified them as having a stellar bar) and measured the same statistic as before. The result is shown in the lower panel, and it shows that barred disc galaxies tend to be found in denser environments than average disc galaxies! This is a completely new result and had never been seen before. Astronomers had not detected this signal before mainly because their samples were too small, but we were able to do better with the classifications provided by Zooites. We argued that barred galaxies often reside in galaxy groups and that a minor merger or interaction with a neighboring galaxy can trigger disc instabilities that produce bars.
What kinds of science shall we use these great datasets and statistics for next? My next priority with Galaxy Zoo is to develop dark matter halo models of the environmental dependence of galaxy morphology. Our measurements are definitely good enough to tell us how spiral and elliptical morphologies are related to the masses of the dark matter haloes that host the galaxies, and these relations would be an excellent and new way to test models and simulations of galaxy formation. And I’m sure there are many other exciting things we can do too.