Archive | December 2013

Announcing Galaxy Zoo’s machine-learning competition (with prize money!)

Since Galaxy Zoo began in 2007, our scientific results have relied on the classifications of our volunteers. These have always been checked (in small numbers) against expert classifications, and several papers have explored how the Galaxy Zoo data compares to results from computers. Galaxy Zoo has compared well with both expert and automated classifications, and that’s helped underscore the science that your observations have made possible.

While doing real science with the Zooniverse has always been our primary goal, we’re also looking to the future; upcoming telescopes like the SKA, LSST, and just-launched Gaia will have billions of new images and detected objects. This will simply be too large for citizen scientists to handle the full scope of data – even if literally everyone on the planet is involved.

This is where Galaxy Zoo will come in yet again. Our goal, which is shared by many groups of astronomers, is to improve the accuracy of the galaxy classifications that can be performed by computers. We’ve done some of this already (Banerji et al. 2010, Huertas-Company et al. 2011), but it’s still not good enough for much of the science we want to do. If we can make these algorithms better, future datasets for citizen science can be selected in advance; we can automatically process the bulk of the images, but still have citizen scientists play a key role in classifying at the more unusual objects. Citizen scientist results will also provide important calibration for the algorithms, and will continue to look for weird and wonderful discoveries like the Voorwerp.

With that goal in mind, we’re pleased to announce the launch of a data science competition for Galaxy Zoo. We’ve partnered with Kaggle, an online platform for predictive modeling that has a massive amount of experience in similar projects. Also working with us is Winton Capital: they’ve generously agreed to provide prize money for the winners of this competition. The first prize is $10,000 USD — we hope this will help incentivize some really great solutions!

Here’s how the competition works. On the Kaggle website, competitors will be given a large set of JPG galaxy images (taken from Galaxy Zoo 2), as well as a big text file with a few dozen variables for each image. These data are a modified version of the classifications that citizen scientists generated in GZ2 (and published in Willett et al. 2013). The goal for competitors is to come up with an algorithm that will predict what those classifications should be based only on the picture. These algorithms are submitted to Kaggle and tested against a second, private set of GZ2 images and classifications. The highest scores on the new set will win the prize money.

Galaxy Zoo's machine learning challenge. Hosted by Kaggle and sponsored by Winton Capital.

Galaxy Zoo’s machine learning challenge. Hosted by Kaggle and sponsored by Winton Capital.

We’re really excited about this competition. For Winton, this will help them identify promising candidates who are skilled at predictive analysis that they might be interested in hiring. For Galaxy Zoo, we’ll use the results for two major things: efficient selection of sources for upcoming citizen science projects, AND analyzing the results to see how the algorithms relate to physical properties of galaxies.

The competition is open to anyone in the world, and will run for three months, ending on March 21, 2014. Participants will need significant programming experience, and a math/astronomy background would probably help since the project relies on image analysis and machine learning. If you’re interested, check out the project at https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge.

See if you can beat deep space net!

Galaxy Zoo Challenge Leaderboard, as of 22 Dec 2013.

Radio Galaxy Zoo: How were the images made?

ARG0002ywh_mod

Today’s post is from Enno Middelberg, RGZ science team member and astronomer at Ruhr-University Bochum, Germany and expert in interferometry. Enno has kindly agreed to share some details of this complex and highly useful technique for improving the resolution of images.

Radio waves from cosmic objects have been observed since the 1930s, starting with Karl Jansky and Grote Reber. In the beginning, astronomers used single telescopes, some of which looked more or less like TV antennas (and some looked just weird, for example Karl Jansky’s self-made telescope). Whatever the telescope looked like, astronomers understood very well that the resolution of their instruments would never be quite as good as at optical wavelengths. The fundamental reason for this has to do with diffraction theory and Fourier transforms, but the outcome is rather simple: the smallest separation on the sky a telescope can “resolve”, which means, that it can actually tell that there are two things and not one slightly extended thing, is given by the fraction λ/D. Here, λ represents the length of the waves observed (some centimetres in radio astronomy), and D represents the diameter of the telescope (some tens of metres). One can easily calculate that this fraction is of order 0.001-0.004 for a radio telescope, but for an optical telescope the number is much smaller, of order 0.00000005 or so. This means that optical telescopes could separate things on the sky which were much smaller together than the first radio telescopes.

Astronomers had tried to improve on this early on, using something called interferometry. The wavelengths could not be changed (otherwise they wouldn’t be radio telescopes any more, right?), and telescopes could only be made as big as 100m (otherwise they would be too heavy and too expensive). So astronomers took two of the telescopes they had and combined their signals into one. Such a contraption with two telescopes is called an interferometer, and its resolving power is no longer given by the diameter of the dishes, but by their separation. So simply moving the two telescopes further away from one another would increase the resolution – what a fantastic idea! In the 1960s, this technique was much advanced by British astronomer Sir Martin Ryle in Cambridge, and he was awarded the Nobel Prize in Physics for his work in 1974.

In the following decades, Martin Ryle’s innovation was improved upon by astronomers all over the world, creating radio interferometers of various sizes and forms. Radio telescopes sprouted like mushrooms. Ever more powerful telescopes were build: the Very Large Array, theAustralia Telescope Compact Array, the Effelsberg and Greenbank giant single dishes, and many more. Most recently, technical advances have made it possible to build completely digital radio telescopes, such as Lofar. Even though these instruments consist of many more than two radio telescopes, the measurements are always made between any two of them: the Very Large Array, for example, has 27 telescopes, which yields 351 two-telescope inteferometers. Using many more such interferometers improves the image quality and, of course, the sensitivity of the final images.

S1189_radio

Radio image with faint emission visible, at the cost of losing details in the bright spots.

S1189_heatmap+contours

Infrared image with radio contours overlayed

Radio images are most commonly reproduced as contour images. This makes them easier to analyse and interpret when printed, and contours are better when very bright and very faint portions of an image have to be shown at the same time. If such information was represented in a grey-scale image, the differences in brightness would not be decipherable. Radio astronomers love contour plots. My wife calls them “fried eggs” and always asks me if the kids can colour them in…

The radio images you’re seeing here are the results of the Australia Telescope Compact Array Large Area Survey (astronomers love acronyms!), or ATLAS for short. Between 2006 and 2009 we have collected data on two small regions in the southern sky to create the basis for an investigation of the way that galaxies evolve. We have used these data to create the radio images you’re seeing when you classify sources. The infrared images were made with the Spitzer telescope, to compare the radio to infrared emission. Radio and infrared waves are not necessarily emitted by the same material and can therefore be displaced from one another in a galaxy. That’s why we need your help to determine what radio blobs belong to which infrared blob!

Radio Galaxy Zoo: a close-up look at one example galaxy

We hope everyone’s been excited about the first few days of Radio Galaxy Zoo; the science and development teams certainly have been. As part of involving you, the volunteers, with the project, I wanted to take the opportunity to examine and discuss just one of the RGZ images in detail. It’s a good way to highlight what we already know about these objects, and the science that your classifications help make possible.

For an example, I’ve chosen the trusty tutorial image, which almost everyone will have seen on their first time using RGZ. We’ll be focusing on the largest components in the center (and skipping over the little one in the bottom left for now).

The tutorial image for Radio Galaxy Zoo

The tutorial image for Radio Galaxy Zoo

The data in this image comes from two separate telescopes. Let’s look at them individually.

The red and white emission in the background is the infrared image; this comes from Spitzer, an orbiting space telescope from NASA launched in 2003 (and still operating today). The data here used its IRAC camera at its shortest wavelength, which is 3.6 micrometers. As you can see, the image is filled with sources; the round, smallest objects are either stars or galaxies not big enough to be resolved by the telescope. Larger sources, where you can see an extended shape, are usually either big galaxies or star/galaxy overlaps that lie very close together in the sky. 

Overlaid on top of that is the data from the radio telescope; this shows up in the faint blue and white colors, as well as the contour lines that encircle the brightest radio components. The telescope used is the Australia Telescope Compact Array (ATCA) in rural New South Wales, Australia. This data was taken as part of the ATLAS survey, which mapped two deep fields of the sky (named ELAIS S1 and CDF-S) in the radio at a wavelength of 20 cm.

So, what do we know about the central sources? From their shape, this looks like what we would call a classic “double lobe” source. There are two radio blobs of similar size, shape, and brightness; almost exactly halfway between them is a bright infrared source. Given its position, it’s a very good candidate as a host galaxy, poised to emit the opposite-facing jets seen in the radio.

This object doesn’t have much of a mention in the published astronomical literature so far. Its formal name in the NASA database is SWIRE4 J003720.35-440735.5 — the name tells us that it was detected as part of the SWIRE survey using Spitzer, and the long string of numbers can be broken up to give us its position on the sky. This is a Southern Hemisphere object, lying in the constellation Phoenix (if anyone’s curious).

The only analysis of this galaxy so far appeared in a paper published by RGZ science team member Enno Middelberg and his collaborators in 2008. They made the first detections of the radio emission from the object, and matched the radio emission to the central infrared source by using an automatic algorithm plus individual verification by the authors. They classified it as a likely AGN based on the shape of the radio lobes, inferring that this meant a jet. It’s also one of the brighter galaxies that they detected in the survey, as you can see below – brighter galaxies are to the right of the arrow. That might mean that it’s a particularly powerful galaxy, but we don’t know that for sure (for reasons I’ll get back to in a bit).

The brightnesses (measured in radio) of galaxies in ATLAS-SWIRE. From Middelberg et al. (2008).

The brightnesses (measured in radio) of galaxies in ATLAS-SWIRE. From Middelberg et al. (2008).

So what we know is somewhat limited – this object has only ever been detected in the radio and near-infrared, and each of those only have two data points. The galaxy is detected at both at 3 and 4 micrometers in the infrared, but the camera didn’t detect it using any of its longer-wavelength channels. This makes it difficult to characterize the emission from the host galaxy; we need more measurements at additional wavelength to determine whether the light we see (in the non radio) is from stars, from dust, or from what we call “non-thermal processes”, driven by black holes and supernovae.

One of the biggest barriers to knowledge, though, is that the galaxy doesn’t currently have a measured distance. Distances are so, so important in astronomy – we spend a massive amount of time trying to accurately figure out how far away things are from the Earth. Knowing the distance tells us what the true brightness of the galaxy is (whether it’s a faint object nearby or a very bright one far away), what the true physical size of the radio jets are, at what age in the Universe it likely formed; a huge amount of science depends critically on this.

Usually distances to galaxies are obtained by taking a spectrum of it with a telescope and then measuring the Doppler shift (redshift) of the lines we detect, caused by the expanding Universe. The obstacle is that spectra are more difficult and more expensive to obtain than images; we can’t do all-sky surveys in the same way we can with just images. This is one reason why these cross-identifications are important; if you can help firmly identify the host galaxy, we can effectively plan future observations on the sources that need it.

Welcome to Radio Galaxy Zoo!

Today’s post is from Ivy Wong, who is delighted to announce our newest Galaxy Zoo project.

Welcome to the extraordinary world of radio astronomy. Observe the Universe through radio goggles and discover the jets that are spewing from the cores of galaxies!

Supermassive black holes lie deep in the cores of many galaxies. And though we cannot directly see these black holes, we do occasionally see the huge jets originating from the cores of some galaxies. However, most of these jets can only be seen in the radio.

Centaurus A in the radio skyThe figure on the left compares the extent of the radio jets from Centaurus A (the nearest radio galaxy to us) to the full moon using the same scale on the sky. Also, the small white dots in this image are not stars but individual background radio sources. The antennas in the foreground are 4 of the 6 antennas that make up the Australia Telescope Compact Array where the radio image was taken.

How do galaxies form these supermassive black holes? And how does having a supermassive black hole affect the evolution of its host galaxy as well as its neighbouring galaxies? Why don’t we see jets in every galaxy with a supermassive black hole? Though much progress has been made in recent years, there are still many open questions such as the above that we can shed light on by amassing a large sample.

To probe the co-evolution of galaxies and their central supermassive black holes, help us map the radio sky by matching the radio jets and filaments to the galaxies (via the infrared images) from whence they came.

Example image with radio jets and infrared galaxies

Can you see the infrared galaxy between the radio jets?

This is a matching & recognition problem that humans are still best at, especially in cases where there are radio jets or multiple sources. And it’s an important task, one that will only become more important as the next generation of radio surveys and instruments come online and start producing enormous amounts of data. So if you’re willing to help, please try out the new Radio Galaxy Zoo and help find some growing black holes — and thank you!

We’re Observing at the Very Large Array!

I’m really excited to be able to post that galaxies selected with the help of Galaxy Zoo classifications are being observed at the VLA (Very Large Array) in New Mexico, possibly right now.

vla-air

It’s the Very Large Array!

The funny thing about observing at the VLA is that you do all of the work for the actual observations in advance.

The VLA runs in queue mode – as an observer you have to submit very (very) detailed information about what you want the telescope to do during your session (called a “scheduling block”) and a set of constraints about when it’s OK to run that (for example you tell them when the galaxy is actually up in the sky above the telescope!). Then the telescope operators pick from the available pool of scheduling blocks at any time to make best use of the array.

This means after you submit the scheduling blocks you just have to sit and wait until you start getting notifications from VLA that your galaxies have been observed. The observing semester for the B-array configuration started on 4th October (had a pause for the US shutdown) and runs until the 13th January 2014. I’m happy to report that we started getting notifications in late November of the first of our 2 hour scheduling blocks  having been observed. At the time of writing four of our galaxies have each been observed at least once (we need six repeat visits to each one to get the depth of data we’d like) for a total of 16 hours of VLA time. I’ve been getting notifications every couple of days – which means that as I write this the VLA could be observing one of our galaxies!

Since making these very detailed observation files is the observing prodecure at the VLA –  it takes the length of time you’d expect given that…..

So, in September  in-between a crazy travel schedule, and with a lot of help from our collaborator Kelley Hess at Cape Town, I spent a lot of time scheduling VLA observations of some very interesting very gas rich and very strongly barred galaxies we identified in the Galaxy Zoo 2 sample (the bit which overlaps with the ALFALFA survey which measures total HI gas in each galaxy).

We have been granted time to observe up to 7 of these fascinating objects (depending on scheduling constraints at the VLA) which I think may reveal some really interesting physics about how bars drive gas around in the discs of galaxies.

You might notice from the picture (and the name) that the VLA is not a “normal telescope”. It’s what astronomers call a radio interferometer. Signals are collected from 27 separate antennas and combined in a computer. This means that as well as observing sources for flux calibration (so we can link how bright our target is through the telescope with physical units) we also have to observe, roughly every 20 minutes or so a “phase calibrator” to be able to know how to correctly add the signals together from each of the antennae (to add them “in phase”).

So a single scheduling block lasting 2 hours for one of our sources comprises:

1. Information to tell the VLA where to slew initially and what instrumentation to use (how to “tune” it to the frequency we know the HI in the galaxy will emit at).

2. A short observation of a known bright source for flux calibration.

Then there’s a loop of

a. Phase calibration

b. Source observation

c. Phase calibration

d. Source observation

and so on – ending with a Phase calibration (on Kelley’s advice we’ll do 5 source observations, and 6 phase calibrations). We have a total of 6 of these blocks for each galaxy, that makes 12 hours of telescope resulting in about 10 hours of collecting 21cm photons per galaxy.

We have to check which times all these sources are visible to the VLA, and set durations for each part which give enough slew time and on source time wherever the sources are on the sky. And this all has to add up exactly to 2 hours to fit the scheduling block.

The benefit of this though is a telescope which acts like it’s much larger than you could ever physically build. We’re trying to detect emission from atomic hydrogen in these galaxies which emits at 21cm. So we need a really large telescope to get a sharp picture.

And just to end, because they’re lovely, here are the four galaxies the VLA has observed so far in the Sloan Digital Sky Survey visible light images.

B10

B7

B1

B2

Thanks again for your help finding these rare and interesting galaxies. They’re rare, because they’re so gas rich and strongly barred – we have previously posted about how we showed strong bars are rare in galaxies with lots of atomic hydrogen. Hopefully we’ll have some exciting results to share once we’ve analysed these data.

(PS. That takes a lot of time too – it’ll be almost 1TB of data to process in total!).