The EarthScope-operated data systems of the NSF GAGE and SAGE Facilities are migrating to cloud services. To learn more about this effort and find resources, visit earthscope.org/data/cloud
One of the enticing prospects driving our migration to cloud-native data archives is the removal of barriers. Certain things simply are not feasible for extremely large datasets in a traditional data center—things like applying a machine learning algorithm to the entire archive. That sort of data download takes so long that shipping physical storage media can be faster. But with cloud storage, data access on this scale is possible, and the horsepower to process all that data is also available.
A pilot effort with one large research group has now produced the first fruits of this labor. That team used machine learning to generate a new earthquake catalog based on over a petabyte of seismic waveform data. The project provided a critical opportunity for us to learn more about how to facilitate this sort of processing and build out some of the required data infrastructure. Not only does that bring us much closer to regularly supporting similarly ambitious research, but the research group is also eager to share their catalog and what they learned about working in the cloud to build it.
Check out our conversation below with Yiyu Ni, a PhD student at the University of Washington, about his work on this project. You can also learn more from a preprint posted by members of the group.
If you’re inspired to take on a similar effort, start a conversation by contacting us at data-help@earthscope.org.