Sponsored Open Data | EarthScope Consortium

The data repositories of the National Science Foundation’s SAGE and GAGE facilities are managed by the EarthScope Consortium. A portion of these repositories are available as open data sets sponsored by AWS and available in a dedicated “sponsored data” bucket.

All data sets contain recordings, or derivatives thereof, from networks of geophysical sensors. A description of each data set is below.

AWS bucket and region

AWS Bucket: s3://earthscope-geophysical-data

AWS Region: us-east-2

While the bucket can be accessed from any location, it is most efficient to access it from the AWS us-east-2 region.

miniSEED repository

Data from selected networks managed by NSF’s SAGE facility are available in the sponsored bucket. Generally these networks deliver continuous seismological data to the facility, with some stations also including other geophysical sensor types, along with station and sensor state-of-health information. The data can contain gaps.

Data format and identification

The data are in miniSEED format, the standard for data exchange in the seismological community defined by the International Federation of Digital Seismograph Networks (FDSN). Data are either in miniSEED version 2, defined in the SEED specification, or miniSEED v3.

Data are identified using a combination of codes defined by the FDSN in the SEED specification and extended by the FDSN Source Identifier specification. These codes are briefly described

Network code – Uniquely identifies the owner and network operator responsible for the data
Station code – Uniquely identifies a station within a network
Location code – Uniquely identifies a group of channels within a station
Channel code – A sequence of codes that identify the band, source and subsource for specific channel

Combined, these codes globally, and uniquely, identify channels of data. The vast majority of data channels are time series of recorded geophysical sensors.

Data organization

Data are organized into objects that contain all of the channels for a given station for a given day. Day boundaries, and the time base for the data, are always in UTC.

The organization is as follows:

s3://earthscope-geophysical-data/miniSEED/NETWORK/YEAR/DAYOFYEAR/STATION.NETWORK.YEAR.DAYOFYEAR[#INT]

where

NETWORK and STATION are data identifier codes
YEAR is always a 4-digit year
DAYOFYEAR is the day of the year from 001-366
For older data, the objects may contain a suffix: a # character followed by an integer, e.g. #2.

For example: s3://earthscope-geophysical-data/miniseed/TA/2004/365/A04A.TA.2004.365#2

During a transition phase there may be objects with the same key (path) with or without the #INT suffix; in these cases the object without the suffix should be preferred.

Metadata

Summary or detailed metadata are available from the facility metadata web service at:

https://service.earthscope.org/fdsnws/station/1

Full granularity metadata is commonly accessed in StationXML format with summaries commonly accessed in GeoCSV and text formats.

Data license and citation

When using these data please cite both the network owners, identified by network code, and the NSF SAGE facility operated by EarthScope.

Network operators and owners specify the license for their data, and an appropriate citation, in their registration information which is available at: https://fdsn.org/networks/. When no license has been declared by the owner, the facility distributes the data with a license of CC-BY-4.0.

The facility operation can be cited using the instructions for citing SAGE here: https://www.earthscope.org/how-to-cite/

Getting started

AWS CLI

> aws s3 ls --no-sign-request s3://earthscope-geophysical-data/
                           PRE miniseed/
2025-05-28 16:29:35       1024 README
>

Python with Boto3 and ObsPy

Boto3 is the AWS SDK for Python supporting access to S3 and ObsPy is a common Python framework for working with seismological data.

import boto3
from obspy import read
import io

# Initialize S3 client
s3 = boto3.client('s3')

# Define bucket and key
bucket_name = 'earthscope-geophysical-data'
object_key = 'miniseed/TA/2004/365/A04A.TA.2004.365#2'

# Download object to memory
response = s3.get_object(Bucket=bucket_name, Key=object_key)
data_stream = io.BytesIO(response['Body'].read())

# Parse with ObsPy
st = read(data_stream)

# Print the ObsPy Streams
print(st)