Zooniverse and the Rise of the Machines: 7th September 2018
Dr Grant Miller, University of Oxford
Zooniverse began 11 years ago with Galaxy Zoo. A PhD project required 1 million galaxies in images from the Sloan Digital Sky Survey to be classified according to shape. Chris Lintott suggested putting the images online and getting the public to do the classifying, because pattern recognition is easy for humans but hard for computers. The trial run was an astonishing success. Other researchers had similar large datasets that could be treated in the same way, such as searches for exoplanets in graphs of star variation, or camera images from penguin colonies: 1 image per minute from 100 cameras for a year. The combination of cheap imaging technology and cheap data storage means that nearly all research projects now accumulate large datasets of images, and have the problem that someone has to look at them all and extract the data. Zooniverse currently has 89 live projects in fields including astronomy, physics, biology and history, and more than 150 projects have been completed. There are more than 2 million volunteers in 237 countries, and what drives most of them is the desire to contribute to science. The process has generated fora, networks and discussion groups, and has encouraged many people to become interested in science.
Future projects will include upcoming large astronomical surveys, such as supernova hunts using images from the Large Synoptic Survey telescope in Chile (when complete). This telescope will survey the entire sky in around 2 days, looking for transient events and changes, and will image about 100 million objects per night. This presents difficulties similar to those of the original Galaxy Zoo. Another project will be the Planetary Response Network, to help the emergency services respond to natural disasters in areas with poor mapping. Before and after images of the areas will be used to help coordinate relief efforts, but this requires a rapid response, and there are not enough volunteers at present. How can these difficulties be overcome?
Zooniverse simply asks multiple people the same question, and takes the average answer, but this is not the smartest method. A faster way could be to take the initial answer if the first few responses agree, but to keep asking if they don’t. Respondents can be weighted by how well they did at previous classifications: different people are better at different types of identifications. Zooniverse is looking at including machine learning in the process. Training computers in pattern recognition requires large datasets with known answers, and Zooniverse provides them. An example is the “Plastic Tide” project, in which computers were trained to recognise pieces of plastic in drone images of beaches. Another example is machine chess programs such as Deep Blue, which can now beat human players: Deep Blue beat Garry Kasparov in 1997. Kasparov also took on “The World” in a match where online spectators voted for their top three moves, and a panel of experts decided which to play. Kasparov won, but it took him 27 moves. The Supernova Hunters project is now trying to implement machine learning: trials have shown that humans can find a subset of supernovae, and computers can find a different subset, but when the two are combined they identify all the potential candidates.
Other developments include a new mobile app, for which the nature of the task has been changed. An image is shown, and there is a yes/no question, rather than a choice of options. This approach can get through a lot of data very quickly, especially if the computer does the analysis then asks the user “do you agree?” However, one important difficulty with computer analysis is the so-called “zorilla problem”, which is that machines are bad at spotting unusual things. The zorilla is an African animal that is rarely seen: in millions of images of animals in the Serengeti there have been only three zorillas detected. Computers cannot be trained to spot unusual things because they are rare, and might never have been seen before. The object called “Hanny’s Voerwerp” in the original Galaxy Zoo is an example. Humans will always be needed to notice and detect such objects.
Notes and summary by Chris Hooker.