The Science Behind Classifying Simulated AGN
This is a dual-purpose post: first, to introduce myself. My name is Brooke Simmons. I’m a graduate student in the final year of my PhD at Yale, and my scientific focus is on examining the co-evolution of supermassive black holes (SMBHs) and their host galaxies. My specialization is in the morphology of galaxies hosting active SMBHs, so naturally I’ve been very intrigued by the Galaxy Zoo project ever since it started. And I’m very impressed by the work that you all do. So when Chris and Carie asked me about simulating some AGN as part of the Galaxy Zoo project, I jumped at the chance. I have pretty extensive experience simulating AGN and host galaxies — more on that in a moment — and I was very excited to have the opportunity to extend that kind of science into the realm of Galaxy Zoo.
I heard later that there were some issues regarding the simulations, and that’s the other purpose of this post. I’d like to try and explain the reasons I think the simulations are important to the science being done in the Zooniverse, and clarify some of the details, if possible. I’m quite new here and I realize that there are many levels of experience reflected in the Zooite population, from newcomers to the field to those who are experienced at following up on objects of interest and searching the scientific literature. So I hope, in giving this science background, that those of you who have heard it before will bear with me, and those of you who haven’t (or, well, anyone really) will feel free to ask any questions that might come up.
In the field of galaxy evolution, it’s now clear there is some sort of mechanism that affects both the evolution of galaxies and the growth of their central black holes together, but we don’t really understand what it is (or what they are) — yet. In terms of scale, it’s rather incredible that they are connected at all. We may call them supermassive black holes, but they’re generally a small percentage of the total galaxy mass, and they’re absolutely tiny when compared to the size of the galaxy. I like to describe it in terms of a football match: the packed, somewhat chaotic crowd in the stands shouldn’t know or care what the ant right in the middle of the playing field is doing. Nor should the ant particularly be aware of how the cheering crowd is shifting and reacting. Yet it is well established that the crowd (the stars in the galaxy) and the ant (the central black hole) somehow know about each other.
How does this work? What forces (or combination of phenomena) act to influence both the single, massive point at the center of a galaxy and the billions of stars around it? Is it a one-sided influence, or is it a feedback mechanism that ends up causing them both to evolve in sync? The co-evolution of galaxies and black holes is one of the fundamental topics of galaxy evolution, and many questions remain unanswered. In order to try to answer these questions, we observe both central black holes and the galaxies that host them, at a variety of redshifts/lookback times, so that we can see how these two things evolve.
However, for all but very local galaxies, it’s very difficult to see a signal from a galaxy’s central SMBH amid all the stellar light from the galaxy. So, we turn to that subset of SMBHs that are actively accreting matter, which in turn heats up and discharges enormous amounts of energy as it falls into the gravitational potential of the black hole. Those, which we call active galactic nuclei (AGN), we can see much more easily, and out to very high redshift. They radiate across the whole of the electromagnetic spectrum, from radio to gamma rays. At optical wavelengths they are sometimes buried in dust and gas, which obscures their light and means they look identical to so-called “inactive” galaxies. But in other cases, the AGN are unobscured or only partially obscured, and then they are extremely bright — so much so that they can far outshine the rest of the galaxy.
So, looking at the central SMBHs of inactive galaxies is impossible for very distant galaxies, because the host galaxy swamps the dim signatures of the black hole. But looking at the hosts of active black holes (AGN) can be difficult too, because the AGN signal can swamp the host galaxy. It’s not impossible to disentangle the two in order to examine the host galaxy separately from the AGN, but it adds a level of complexity to the process. Morphological fitting programs executed by a computer do a reasonable job, but actually — as you all know — the human brain is excellent at this kind of pattern recognition. You all can clearly tell the difference between a host galaxy and its central AGN, to the point where many of you have been following up your classified objects and identifying spectral features of AGN. That is so impressive!
In fact, what you collectively do is new and different and in many ways a significant improvement over “parametric” methods that use automated computer codes to fit galaxy morphology models to images. Those have their uses, too, of course, but you all pick up nuances that parametric methods simply miss. And part of what that means is that the data we have on how the presence of an AGN (bright or faint) affects morphological classification may or may not apply to your work. Within the automatic fitting programs, there are subtle effects that can occur. For example, a small galaxy bulge may look the same to a parametric fitting routine as a central AGN, with the consequence that it may think it has found one when in fact it’s the other. Or, when both a small bulge and a central AGN are present, a computer code to fit the morphology might be more uncertain about how much luminosity goes with each component. I know all this because it has now been studied for automated/parametric morphology fitting codes:
- Sànchez et al. (2004) is mainly a data analysis paper on AGN host galaxies, but contains a subsection on simulations;
- Simmons & Urry (2008) is a paper describing two sets of AGN host simulations that combine for over 50,000 simulated galaxies (yes, that Simmons is me);
- Gabor et al. (2009) is another data analysis paper that contains AGN host simulations; and
- Pierce et al. (2010) is a dedicated simulations paper with a smaller sample than Simmons & Urry, but which also extends the analysis to host galaxy colors.
All of this analysis was undertaken with Hubble data, much like the images of simulated AGN that have been incorporated into Galaxy Zoo. These are small effects that only impact a fraction of classifications, but the simulations are crucial because they both let us know the limits of our classification methods and, just as importantly, enable us to quantify precisely how confident we are that the classifications are accurate. The parametric methods are very accurate, but it is absolutely essential that we find out just how the presence of an AGN affects classification in this setting, which is of course quite different.
It’s always exciting for a scientist to say “I don’t know the answer, but I know how we can find out.” And in this case, that means extending the simulations that we have done to the case of visual classification. There are a few ways to simulate AGN, but the key process is to create a situation where the analysis takes place on a known quantity so that you can compare what you know to what the analysis finds. In this case (which is similar in method to the first set of simulations in Simmons & Urry), that means:
- Start with a set of galaxies for which we know the initial “answer,” i.e., the morphology;
- Add a simulated signal from a central SMBH, using a wide range of luminosity ratios between galaxy and SMBH, and a range of AGN colors;
- Repeat the classification process in exactly the same way as for the initial set of galaxies, to see if the answers change.
Now, it may be that the answers change in some subtle way, as they do when the morphological analysis is done by a computer. If that’s the case, then the analysis quantifies that effect so that we can understand it and account for it. Or, it may be that you see right through it — and if so, that’s great! If you look at a galaxy with an AGN and say, “of course I can tell that that galaxy has an AGN in it, and I can still classify the galaxy in the same way,” fantastic. It potentially means you all are doing better science than a traditional parametric analysis in yet another way. Either way, the answer is very useful.
I know this post is already quite long, but I think it’s important to make one other point about the simulations. When you’re simulating something like this to understand the effects of a new feature on analysis you think you understand very well, it’s very important to try to push the limits of that analysis. In this case, that means simulating AGN that are both so faint that there’s pretty much no way you could possibly see them in their host galaxies, and so bright that they will be blindingly obvious to anyone paying attention.
And you all are most definitely paying attention. In reading comments on other blog posts, I saw that some Zooites were displeased with the way the very brightest simulated AGN looked strange, and even artificial. And I know there were some communication issues regarding the release of the simulations; for that I apologize. Actually, though, knowing which objects you found odd is a part of the science, too. Simulations like this can be used not only to understand the science of determining galaxy morphology, but also to understand the science of separating the AGN itself for later analysis — as I said earlier, both are needed to understand the co-evolution of black holes and galaxies. So questions like, “when does the AGN get lost in the galaxy?” and “when does the AGN totally overtake the galaxy?” are vital. It is also definitely the case that we were pushing the limits of not just the classification, but also of the software that is used to make the simulated AGN. That’s why the very brightest of them look a bit… weird. I do think the science is still possible even if it’s clear that it’s a simulation on first glance, and I really appreciate your patience with both me and the software on that issue.
By the way, I have a feeling you all are going to turn out to be considerably better at classifying galaxies with AGN in them than the computer is, but of course that’s just my hypothesis — it’s important to actually go through the process of classifying simulated galaxies. That way, when someone comes up to us and says, but how do you know all these citizen scientists are really that accurate? AGN can have a subtle effect on the fitted morphologies of galaxies, after all, we can say, “but we do know they’re that accurate — and here’s how we know.”
Thanks all for reading — if you got this far, that probably deserves an award in itself — and please feel free to ask any questions you might have. If you have a concern that you feel I didn’t address, please let me know that as well. I would very much appreciate your input!