Cloud computing at the Community Science Workshop

A speaker at a lectern with slides on the projector screen behind them. — Ebru Bozdağ presents during a plenary session. (Photo: Scott Johnson/EarthScope)

This year’s NSF GAGE/SAGE Community Science Workshop included our first short course on GeoLab—our JupyterHub environment for cloud-native analysis and training—as well as a discussion session on the status of the cloud migration of our data archives and multiple presentations on community projects. If you weren’t able to join us in Bloomington a few weeks ago, here’s a quick overview of the cloud-related highlights.

A four-hour Cloud 101 short course introduced participants to some foundational concepts in cloud computing, like the advantages of ARCO datasets, and demonstrated some example workflows running in GeoLab to leverage data access via the EarthScope API. Attendees learned how they could interact with geophysical data in a scalable computing environment that supports both education and research.

Presentations

Several poster presentations highlighted the expanding ecosystem of cloud-based tools supporting geophysical research. In addition to posters by community members on their work, EarthScope staff presented several on our own efforts.

One focused on the migration of international data exchange pipelines to the new AWS cloud infrastructure. This emphasized the role of shared data standards and federated schemas in enabling secure, scalable, and normalized access to global geophysical datasets. Another presented the latest progress on GeoLab, EarthScope’s new JupyterHub platform for data-proximate computing. GeoLab is already in use for training workshops and work is ongoing to support scalable research through direct access to cloud-hosted datasets, integrated APIs, and domain-specific software environments. And one more EarthScope poster presented updates to the geodetic component of ShakeAlert, highlighting a new cloud-native processing pipeline and real-time monitoring dashboards. Enhancements included improved data latency and reliability through pre-aggregated metrics and streamlined visualization tools.

In the plenary sessions, Einat Lev of Columbia University presented on the VICTOR (Volcanology Infrastructure for Computational Tools and Resources) project, which has operated shared cloud platforms for the research community. In addition to providing access to collaborative computing resources in the form of notebook hubs, this effort also maintains a hazard model library—simplifying common workflows and even enabling model comparison benchmarking work.

Ebru Bozdağ of the Colorado School of Mines gave a talk about equipping the next generation of seismologists with computing skills through the SCOPED (Seismic Computational Platform for Empowering Discovery) and SPAC-MAN (Slabs, Plumes, and Convection in Mantle) projects. A number of SCOPED training events, lectures, and tutorials have helped participants learn how to use cloud and HPC resources, as well as equipping them with software tools. And SPAC-MAN mantle visualizations have been explored by students using virtual reality devices, and by a general audience through planetarium events.

And for one more example, Weiqiang Zhu of UC Berkeley presented on machine learning models applied to California seismic data, including large DAS datasets. Utilizing cloud computing enabled this work to hit large scales, processing hundreds of terabytes of data to build new earthquake catalogs, or over a billion cross-correlation pairs for ambient noise interferometry.

Discussions

Two breakout discussion sessions reinforced the community’s shared interest in making cloud adoption more inclusive and impactful.

The “Transition to the Cloud” session included conversations about the capabilities rolling out from the Common Cloud Platform for the NSF SAGE and GAGE data archives, discussion of community computing resources and training plans under the Cloud On-Ramp initiative, and the presentation of a case study of petabyte-scale machine learning processing in the cloud. (We’ll share a deeper look at that case study soon.)

A separate session collected input on the infrastructure and programming needed to support education, workforce development, and communication goals for the future National Geophysical Facility, with strong interest expressed in GeoLab and cloud-based tutorials.

Consistent themes

Three things that we continue to hear about at events like this are scalability, accessibility, and community-driven development.

Cloud-native tools and infrastructure are enabling researchers to efficiently analyze large, complex datasets, whether building earthquake catalogs, managing global data exchanges, or supporting real-time monitoring systems. Training workshops, platforms like GeoLab, and streamlined access to cloud-hosted datasets are helping more users adopt cloud workflows. And from instructional use to advanced research applications, cloud tools are being developed and shaped by community feedback and research needs.

We’re designing our cloud services to support the diverse and evolving workflows of our community for research and education, so all this input is incredibly helpful. If you have any more input to share or questions you want answered, please send them in via our form!