- In one day, a group of 200 hackers, coders, and students saved 8,404 NASA and DOE webpages onto the Internet Archive and downloaded 25 gigabytes from 101 public datasets.
- Access to this valuable information is essential if we want to continue to make scientific progress, especially in the field of climate research.
Baggers and Taggers
With pages of climate-related documents and other environmental issues quickly disappearing from government websites, hackers, students, and scientists have decided to take it upon themselves to salvage the information that still remains. Groups in more than 20 cities have embarked upon the task of collecting this valuable data and saving it outside of government servers. This weekend, a group of 200 hackers, coders, and students from the University of California, Berkeley decided to go even further.
Organized by groups like DataRefuge and the Environmental Data and Governance Initiative, the UC Berkeley hackathon on Saturday didn’t just collect data from NASA’s Earth sciences programs and the Department of Energy. They also started building robust systems to monitor whatever changes might occur in these sites in the future and keep track of what’s already been removed.
Their task was two-fold. Half of the group, the “taggers,” placed web crawlers on easily copied government webpages and sent their text to the Internet Archive as digital copies. Another group, the “baggers,” worked on the more data-intensive websites. Using custom scripts, they scraped complicated data sets from these federal websites, and as with most worthwhile tasks, collecting data from these pages wasn’t easy. “All these systems were written piecemeal over the course of 30 years. There’s no coherent philosophy to providing data on these websites,” Daniel Roesler, CTO at UtilityAPI and a volunteer guide for the UC Berkeley baggers, explained to Wired.
By the end of Saturday, the effort collectively loaded 8,404 NASA and DOE webpages onto the Internet Archive and downloaded 25 gigabytes from 101 public datasets. But more work needs to be done, and the organizers know this, so they plan to work on building tools to continually track and monitor similar websites. “Climate change data is just the tip of the iceberg,” said Eric Kansa from the non-profit group Open Context. “There are a huge number of other datasets being threatened with cultural, historical, sociological information.”
Right now, scientists are working hard to achieve breakthroughs in numerous fields that will completely transform our world. They’re trying to figure out how to put people on Mars, build supercomputers, produce clean energy, and so much more. Key to making progress on any of these fronts is access to information and the ability to communicate with their fellow researchers. By archiving this important data, individuals like those at the UC Berkeley hackathon are helping the scientific community stay on course at a time when the obstacles faced may seem insurmountable.