Explaining clustering statistics we use to study the distribution of Galaxy Zoo galaxies
I’ve used some statistical tools to analyze the spatial distribution of Galaxy Zoo galaxies and to see whether we find galaxies with particular classifications in more dense environments or less dense ones. By “environment” I’m referring to the kinds of regions that these galaxies tend to be found: for example, galaxies in dense environments are usually strongly clustered in groups and clusters of many galaxies. In particular, I’ve used what we call “marked correlation functions,” which I’ve found are very sensitive statistics for identifying and quantifying trends between objects and their environments. This is also important from the perspective of models, since we think that massive clumps of dark matter are in the same regions as massive galaxy groups.
We’ve mainly used them in two papers, where we analyzed the environmental dependence of morphology and color and where we analyzed the environmental dependence of barred galaxies. These papers have been described a bit in this post andthis post. We’ve also had other Galaxy Zoo papers about similar subjects, especially this paper by Steven Bamford and this one by Kevin Casteels.
What I loved about these projects is that we obtained impressive results that nobody else had seen before, and it’s all thanks to the many many classifications that the citizen scientists have contributed. These statistics are useful only when one has large catalogs, and that’s exactly what we had in Galaxy Zoo 1 and 2. We have catalogs with visual classifications and type likelihoods that are ten times as large as ones other astronomers have used.
What are these “marked correlation functions”, you ask? Traditional correlation functions tell us about how objects are clustered relative to random clustering, and we usually write this as 1+ ξ. But we have lots of information about these galaxies, more than just their spatial positions. So we can weight the galaxies by a particular property, such as the elliptical galaxy likelihood, and then measure the clustering signal. We usually write this as 1+W. Then the ratio of (1+W)/(1+ξ), which is the marked correlation function M(r), tells us whether galaxies with high values of the weight are more dense or less dense environments on average. And if 1+W=1+ξ, or in other words M=1, then the weight is not correlated with the environment at all.
First, I’ll show you one of our main results from that paper using Galaxy Zoo 1 data. The upper panel shows the clustering of galaxies in the sample we selected, and it’s a function of projected galaxy separation (rp). This is something other people have measured before, and we already knew that galaxies are clustered more than random clustering. But then we weighted the galaxies by the GZ elliptical likelihood (based on the fraction of classifiers identifying the galaxies as ellipticals) and then took the (1+W)/(1+ξ) ratio, which is M(rp), and that’s shown by the red squares in the lower panel. When we use the spiral likelihoods, the blue squares are the result. This means that elliptical galaxies tend to be found in dense environments, since they have a M(rp) ratio that’s greater than 1, and spiral galaxies are in less dense environments than average. When I first ran these measurements, I expected kind of noisy results, but the measurements are very precise and they far exceeded my expectations. Without many visual classifications of every galaxy, this wouldn’t be possible.
Second, using Galaxy Zoo 2 data, we measured the clustering of disc galaxies, and that’s shown in the upper panel of the plot above. Then we weighted the galaxies by their bar likelihoods (based on the fractions of people who classified them as having a stellar bar) and measured the same statistic as before. The result is shown in the lower panel, and it shows that barred disc galaxies tend to be found in denser environments than average disc galaxies! This is a completely new result and had never been seen before. Astronomers had not detected this signal before mainly because their samples were too small, but we were able to do better with the classifications provided by Zooites. We argued that barred galaxies often reside in galaxy groups and that a minor merger or interaction with a neighboring galaxy can trigger disc instabilities that produce bars.
What kinds of science shall we use these great datasets and statistics for next? My next priority with Galaxy Zoo is to develop dark matter halo models of the environmental dependence of galaxy morphology. Our measurements are definitely good enough to tell us how spiral and elliptical morphologies are related to the masses of the dark matter haloes that host the galaxies, and these relations would be an excellent and new way to test models and simulations of galaxy formation. And I’m sure there are many other exciting things we can do too.