Skip to content

2024 Technical Short Course: Using MsPASS for Seismology Data Processing

Date(s): July 8-12, 2024
Location: Virtual
Deadline: May 1, 2024

MsPASS is the only current, generic software system capable of handling large volumes of earthquake seismology data on large HPC and cloud systems. The package was designed to handle research computing where the types of data processing is variable and depends on the science questions being addressed. This course will introduce students to the system starting from a perspective of the package as an ObsPy on steroids. It also provides a mechanism for reproducible science by stressing the use of jupyter notebooks and a novel data history tracking mechanism. In the first session students will process a small dataset on a desktop/laptop to understand how data flows through the system and is managed by the MongoDB database. The second session will be held using an HPC cluster at TACC to demonstrate how the system handles huge data volumes with parallel schedulers. The final session  will review some important technical concepts in the system that students will find helpful in developing their own research workflows.

Time: 2 hours each, in 3 sessions
Primary Audience: Advanced seismology graduate students, post docs, and early career scientists.
Secondary Audience: Any seismologist interested in learning more about state-of-the-art big data topics.

Learning Objectives

  • Understand containers and how they are used in MsPASS
  • Be able to construct a basic workflow in python to do waveform processing with a dataset.
  • Know how to download station and source metadata from NSF SAGE/GAGE web services and build a local database of the downloaded data.
  • Understand how schedulers are used to process large data volumes in parallel on HPC or cloud systems.
  • Learn the fundamentals of the MongoDB query language and know how to construct basic query and update operators.

Participant Commitment

Class time plus 2 to 3 hours of homework between sessions.

Prerequisites

  • Have at least an introductory course in seismology. 
  • Have knowledge of python basics, jupyter notebooks, and unix shell scripts.
  • Have a basic knowledge of signal processing is recommended, but not required.

Computer and Data

A good laptop or desktop with at least a dual core cpu. A minimum of 8GB of memory would be advisable. Internet connection that provides a solid Zoom connection for the length of each session should be adequate. Work is mainly done on the local machine or by communication with a TACC in a manner not different from typical web browsing.

The software MsPASS runs on all standard platforms (Mac, Windows, or Linux) using a container run package called docker. HPC systems all use a compatible program called apptainer (formerly singularity). 

Brief Agenda

Tentative agenda is listed below, and subject to change.

Session 1
(2 hours)
Lecture:  What is MsPASS 
Tutorial: MongoDB fundamentals.
Hands-on work:  Setting up MsPASS and running jupyter notebooks
Session 2
(2 hours)
Lectures: Parallel computing, and what a scheduler needs to process
this data
Session 3
(2 hours)
Tutorials: How to manage data with MongoDB, and handling three-
component data in MsPASS.

Assessment

Students will be given assignments to complete after the first two sessions. Please be prepared to do a final project after the last session using MsPASS as an element of your current research.

Instructors

  • Gary Pavlis, Indiana University
  • Ian Wang, University of Texas
  • Robert Weekly, EarthScope