The EarthScope-operated data systems of the NSF GAGE and SAGE Facilities are migrating to cloud services. To learn more about this effort and find resources, visit earthscope.org/data/cloud
The July 8-12 Massive Parallel Analysis System for Seismologists (MsPASS) short course was the first to run in our GeoLab JupyterHub environment—a successful experience, we’re pleased to report. This milestone marks a significant achievement, showcasing one of the benefits of the new cloud-based platform we are building.
The MsPASS course introduced participants to the only current, generic software system capable of handling large volumes of earthquake seismology data on large HPC and cloud systems. Designed to accommodate variable data processing needs based on specific scientific questions, MsPASS is akin to an enhanced version of ObsPy. The course provided a comprehensive overview of the system’s capabilities, equipping participants with the tools to leverage this powerful package for their research.
One of the key benefits of using the GeoLab environment was immediately apparent when each student was able to successfully login to the Hub and their environment was already configured and ready to dive into the course work. Since we created a pre-configured environment tailored to the course content beforehand, there was no need to work with individual students to troubleshoot their individual environments once the course started—everyone logged in with the same environment, the same software, and the same data available without having to worry about how their particular computer was configured.
Behind the curtain
The GeoLab JupyterHub environment performed well throughout the course. There were no technical issues, and the environment scaled as expected to handle the increased traffic. From our backend dashboard, we watched as the Hub automatically scaled the number of available nodes with the number of students logging in throughout the week. The amount of nodes increased when more students logged in, and decreased when students logged off, which means we didn’t have to allocate resources that we weren’t using.
Still, every good pilot test shows you some areas you can improve. In this case, we found a little too much friction in the process of registering as a new user, being assigned access to the short course image in GeoLab, and correctly selecting that image when logging in to work with the course materials. We’ll be working to make that process simpler and add tutorial instructions that eliminate the potential for confusion.
We’ve also used this initial experience to create a blueprint for the setup of future short courses, from collecting information from instructors to deploying the software images and ensuring participants have all the resources they need.
And while the computational power scaled quite well, we did observe some minor slowdowns when the full course of students were simultaneously churning through some especially data-intensive parts of the workflow—just the sort of issue that only reveals itself when a large number of participants stress test the system together.
A glimpse into the future
The success of the MsPASS Short Course demonstrates how useful we hope GeoLab will be in providing equitable access to advanced training and resources. Our aim is to ensure that researchers, regardless of their location or resources, can benefit from cloud computing capabilities. GeoLab will be a valuable tool to support more short courses in the future, streamlining the process so courses can focus on the content without losing time to getting every student up and running.
Development of GeoLab is occurring in parallel with the ongoing development of our cloud data systems. When our data archives are cloud-optimized, GeoLab will provide a convenient platform to begin taking advantage of more convenient, more powerful data processing techniques.
As this development progresses, we will be sharing more opportunities to explore the potential of the GeoLab JupyterHub environment for your research and educational endeavors.