The Science Behind Classifying Simulated AGN

Hello Zooites,

This is a dual-purpose post: first, to introduce myself. My name is Brooke Simmons. I’m a graduate student in the final year of my PhD at Yale, and my scientific focus is on examining the co-evolution of supermassive black holes (SMBHs) and their host galaxies. My specialization is in the morphology of galaxies hosting active SMBHs, so naturally I’ve been very intrigued by the Galaxy Zoo project ever since it started. And I’m very impressed by the work that you all do. So when Chris and Carie asked me about simulating some AGN as part of the Galaxy Zoo project, I jumped at the chance. I have pretty extensive experience simulating AGN and host galaxies — more on that in a moment — and I was very excited to have the opportunity to extend that kind of science into the realm of Galaxy Zoo.

I heard later that there were some issues regarding the simulations, and that’s the other purpose of this post. I’d like to try and explain the reasons I think the simulations are important to the science being done in the Zooniverse, and clarify some of the details, if possible. I’m quite new here and I realize that there are many levels of experience reflected in the Zooite population, from newcomers to the field to those who are experienced at following up on objects of interest and searching the scientific literature. So I hope, in giving this science background, that those of you who have heard it before will bear with me, and those of you who haven’t (or, well, anyone really) will feel free to ask any questions that might come up.

In the field of galaxy evolution, it’s now clear there is some sort of mechanism that affects both the evolution of galaxies and the growth of their central black holes together, but we don’t really understand what it is (or what they are) — yet. In terms of scale, it’s rather incredible that they are connected at all. We may call them supermassive black holes, but they’re generally a small percentage of the total galaxy mass, and they’re absolutely tiny when compared to the size of the galaxy. I like to describe it in terms of a football match: the packed, somewhat chaotic crowd in the stands shouldn’t know or care what the ant right in the middle of the playing field is doing. Nor should the ant particularly be aware of how the cheering crowd is shifting and reacting. Yet it is well established that the crowd (the stars in the galaxy) and the ant (the central black hole) somehow know about each other.

How does this work? What forces (or combination of phenomena) act to influence both the single, massive point at the center of a galaxy and the billions of stars around it? Is it a one-sided influence, or is it a feedback mechanism that ends up causing them both to evolve in sync? The co-evolution of galaxies and black holes is one of the fundamental topics of galaxy evolution, and many questions remain unanswered. In order to try to answer these questions, we observe both central black holes and the galaxies that host them, at a variety of redshifts/lookback times, so that we can see how these two things evolve.

However, for all but very local galaxies, it’s very difficult to see a signal from a galaxy’s central SMBH amid all the stellar light from the galaxy. So, we turn to that subset of SMBHs that are actively accreting matter, which in turn heats up and discharges enormous amounts of energy as it falls into the gravitational potential of the black hole. Those, which we call active galactic nuclei (AGN), we can see much more easily, and out to very high redshift. They radiate across the whole of the electromagnetic spectrum, from radio to gamma rays. At optical wavelengths they are sometimes buried in dust and gas, which obscures their light and means they look identical to so-called “inactive” galaxies. But in other cases, the AGN are unobscured or only partially obscured, and then they are extremely bright — so much so that they can far outshine the rest of the galaxy.

So, looking at the central SMBHs of inactive galaxies is impossible for very distant galaxies, because the host galaxy swamps the dim signatures of the black hole. But looking at the hosts of active black holes (AGN) can be difficult too, because the AGN signal can swamp the host galaxy. It’s not impossible to disentangle the two in order to examine the host galaxy separately from the AGN, but it adds a level of complexity to the process. Morphological fitting programs executed by a computer do a reasonable job, but actually — as you all know — the human brain is excellent at this kind of pattern recognition. You all can clearly tell the difference between a host galaxy and its central AGN, to the point where many of you have been following up your classified objects and identifying spectral features of AGN. That is so impressive!

In fact, what you collectively do is new and different and in many ways a significant improvement over “parametric” methods that use automated computer codes to fit galaxy morphology models to images. Those have their uses, too, of course, but you all pick up nuances that parametric methods simply miss. And part of what that means is that the data we have on how the presence of an AGN (bright or faint) affects morphological classification may or may not apply to your work. Within the automatic fitting programs, there are subtle effects that can occur. For example, a small galaxy bulge may look the same to a parametric fitting routine as a central AGN, with the consequence that it may think it has found one when in fact it’s the other. Or, when both a small bulge and a central AGN are present, a computer code to fit the morphology might be more uncertain about how much luminosity goes with each component. I know all this because it has now been studied for automated/parametric morphology fitting codes:

  • Sànchez et al. (2004) is mainly a data analysis paper on AGN host galaxies, but contains a subsection on simulations;
  • Simmons & Urry (2008) is a paper describing two sets of AGN host simulations that combine for over 50,000 simulated galaxies (yes, that Simmons is me);
  • Gabor et al. (2009) is another data analysis paper that contains AGN host simulations; and
  • Pierce et al. (2010) is a dedicated simulations paper with a smaller sample than Simmons & Urry, but which also extends the analysis to host galaxy colors.

All of this analysis was undertaken with Hubble data, much like the images of simulated AGN that have been incorporated into Galaxy Zoo. These are small effects that only impact a fraction of classifications, but the simulations are crucial because they both let us know the limits of our classification methods and, just as importantly, enable us to quantify precisely how confident we are that the classifications are accurate. The parametric methods are very accurate, but it is absolutely essential that we find out just how the presence of an AGN affects classification in this setting, which is of course quite different.

It’s always exciting for a scientist to say “I don’t know the answer, but I know how we can find out.” And in this case, that means extending the simulations that we have done to the case of visual classification. There are a few ways to simulate AGN, but the key process is to create a situation where the analysis takes place on a known quantity so that you can compare what you know to what the analysis finds. In this case (which is similar in method to the first set of simulations in Simmons & Urry), that means:

  • Start with a set of galaxies for which we know the initial “answer,” i.e., the morphology;
  • Add a simulated signal from a central SMBH, using a wide range of luminosity ratios between galaxy and SMBH, and a range of AGN colors;
  • Repeat the classification process in exactly the same way as for the initial set of galaxies, to see if the answers change.

Now, it may be that the answers change in some subtle way, as they do when the morphological analysis is done by a computer. If that’s the case, then the analysis quantifies that effect so that we can understand it and account for it. Or, it may be that you see right through it — and if so, that’s great! If you look at a galaxy with an AGN and say, “of course I can tell that that galaxy has an AGN in it, and I can still classify the galaxy in the same way,” fantastic. It potentially means you all are doing better science than a traditional parametric analysis in yet another way. Either way, the answer is very useful.

I know this post is already quite long, but I think it’s important to make one other point about the simulations. When you’re simulating something like this to understand the effects of a new feature on analysis you think you understand very well, it’s very important to try to push the limits of that analysis. In this case, that means simulating AGN that are both so faint that there’s pretty much no way you could possibly see them in their host galaxies, and so bright that they will be blindingly obvious to anyone paying attention.

And you all are most definitely paying attention. In reading comments on other blog posts, I saw that some Zooites were displeased with the way the very brightest simulated AGN looked strange, and even artificial. And I know there were some communication issues regarding the release of the simulations; for that I apologize. Actually, though, knowing which objects you found odd is a part of the science, too. Simulations like this can be used not only to understand the science of determining galaxy morphology, but also to understand the science of separating the AGN itself for later analysis — as I said earlier, both are needed to understand the co-evolution of black holes and galaxies. So questions like, “when does the AGN get lost in the galaxy?” and “when does the AGN totally overtake the galaxy?” are vital. It is also definitely the case that we were pushing the limits of not just the classification, but also of the software that is used to make the simulated AGN. That’s why the very brightest of them look a bit… weird. I do think the science is still possible even if it’s clear that it’s a simulation on first glance, and I really appreciate your patience with both me and the software on that issue.

By the way, I have a feeling you all are going to turn out to be considerably better at classifying galaxies with AGN in them than the computer is, but of course that’s just my hypothesis — it’s important to actually go through the process of classifying simulated galaxies. That way, when someone comes up to us and says, but how do you know all these citizen scientists are really that accurate? AGN can have a subtle effect on the fitted morphologies of galaxies, after all, we can say, “but we do know they’re that accurate — and here’s how we know.”

Thanks all for reading — if you got this far, that probably deserves an award in itself — and please feel free to ask any questions you might have. If you have a concern that you feel I didn’t address, please let me know that as well. I would very much appreciate your input!

Tags: ,

About The Zooniverse

Online citizen science projects. The Zooniverse is doing real science online,.

7 responses to “The Science Behind Classifying Simulated AGN”

  1. join says :

    “Two examples of the central point source subtraction for SDSS gr images of a morphologically disturbed galaxy (top row) and
    an elliptical galaxy (bottom row).”

    Click to access Schawinski.AGN.2009.pdf

  2. echo-lily-mai says :

    Thanks for writing a wonderful blog explaining everything perfectly!

  3. udin says :


  4. c_cld says :

    Hi Brooke,
    Your research’s goals as a follow-up on your previous publications is a great job.
    If I understand correctly, you are looking for differentiating by the human eyeball the morphology on approximately 391 AGN in the Cosmos field (Gabor paper) by doing simulations on around a hundred of them.
    You seem confident to find new parametric ways to dig in more AGNs from deep images surveys by the statistics you’ll gather on zooites clicks.
    Let me say that I am a little skeptical on the findings with your simulated panel: as I had to classify some of them, my response was mainly ‘artefact’ without trying to determine the surroundings (elliptical, spiral and afterwards possible merger). Your color/ magnitude simulations tended to sharpen the impression of a fake (blue core over red or vice-versa): I wonder if, before releasing the simulated images, you ‘ve done the diagrams we see in Pierce paper (Gini index, sercic profile and so on..).
    Last I didn’t see any comments on my jan 17th response to Chris blog ‘More on our fake experiment’ where I mentionned the arXiv:1007.1453v1 study with Chandra: I think more is to gain with overlapping surveys in different wavelengths.


  5. waveney says :

    I have been doing a lot of work on Irregulars (galaxies that are neither Spiral, Elliptical or interacting/merging) (PhD Interview tomorrow, Carrie has some of my older working papers).

    So far, none has been found to have an AGN (in the first 5,500 studied). In the comming months I will widen the search to a larger data set which is nearing completion. Have you ever seen an AGN in irregular galaxies? I have been wondering if they are irregular because they dont have a SMBH, and whether the presence of the SMBH likewise makes spirals and ellipticals regular.

    Have you ever seen any counter examples?

  6. Brooke says :


    I think I have seen a few examples, and I will try to find them and send them to you. I think that’s a very interesting project. I hope the interview went well!


    This set of simulated AGN comes from a different Hubble field than the COSMOS observations, but the results of the simulations are generally applicable to all data taken from the same camera on Hubble.

    However, I do want to clarify that, within the simulations, we’re not using galaxies we already know have AGN in them. In fact, it’s just the opposite. The simulations begin with a galaxy from the GEMS field that we know is just a normal galaxy without a detected AGN. Then, we create 15 simulated AGN+host galaxies out of that one normal galaxy by adding simulated AGN with a range of luminosities and colors. I describe this further in the post “Simulated AGN: An Example” and show a sample galaxy and all the AGN+hosts that were made from it. Because the original galaxy doesn’t have an AGN in it, the way you’ve classified it before establishes a baseline morphology that we can compare to the classifications of the simulated objects with AGN in them.

    I think the color question is interesting, and you’re right that one of the purposes of the simulations is to see whether the classifications change if the color of the central AGN is different than the host vs. similar to the host. It is definitely possible for a reddened AGN to be hosted in a blue galaxy, and vice-versa, so while in some cases the contrast may be very apparent, this could also be the case with an observed object (i.e., not simulated).

    To my knowledge, we have not run parametric fitting routines on these simulated AGN, so diagrams similar to those in the papers I mentioned aren’t possible yet. We may do that eventually, but it may end up being unnecessary. None of the papers I mentioned analyze the effects of AGN on the visual classification Zooites specialize in.

    As to the other comment on Chris’ post, that is an interesting paper and it makes an important point about understanding the selection effects in a sample. You’re right that AGN are a very small fraction of all galaxies, but they’re an important fraction because they provide leverage for us to study the central black holes of very distant objects. And I quite agree that multi-wavelength studies are beneficial. In the case of AGN, they’re crucial, because searching for AGN at multiple wavelengths is the best way to find as close to a complete sample as possible.

    Thank you for your comments!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: