The data repositories of the National Science Foundation’s SAGE and GAGE facilities are managed by the EarthScope Consortium. A portion of these repositories are available as open data sets sponsored by AWS and available in a dedicated “sponsored data” bucket.
All data sets contain recordings, or derivatives thereof, from networks of geophysical sensors. A description of each data set is below.
AWS bucket and region
AWS Bucket: s3://earthscope-geophysical-data
AWS Region: us-east-2
While the bucket can be accessed from any location, it is most efficient to access it from the AWS us-east-2 region.
miniSEED repository
Data from selected networks managed by NSF’s SAGE facility are available in the sponsored bucket. Generally these networks deliver continuous seismological data to the facility, with some stations also including other geophysical sensor types, along with station and sensor state-of-health information. The data can contain gaps.
Data format and identification
The data are in miniSEED format, the standard for data exchange in the seismological community defined by the International Federation of Digital Seismograph Networks (FDSN). Data are either in miniSEED version 2, defined in the SEED specification, or miniSEED v3.
Data are identified using a combination of codes defined by the FDSN in the SEED specification and extended by the FDSN Source Identifier specification. These codes are briefly described
- Network code – Uniquely identifies the owner and network operator responsible for the data
- Station code – Uniquely identifies a station within a network
- Location code – Uniquely identifies a group of channels within a station
- Channel code – A sequence of codes that identify the band, source and subsource for specific channel
Combined, these codes globally, and uniquely, identify channels of data. The vast majority of data channels are time series of recorded geophysical sensors.
Data organization
Data are organized into objects that contain all of the channels for a given station for a given day. Day boundaries, and the time base for the data, are always in UTC.
The organization is as follows:
s3://earthscope-geophysical-data/miniSEED/NETWORK/YEAR/DAYOFYEAR/STATION.NETWORK.YEAR.DAYOFYEAR[#INT]
where
NETWORK
andSTATION
are data identifier codesYEAR
is always a 4-digit yearDAYOFYEAR
is the day of the year from 001-366- For older data, the objects may contain a suffix: a
#
character followed by an integer, e.g.#2
.
For example: s3://earthscope-geophysical-data/miniseed/TA/2004/365/A04A.TA.2004.365#2
During a transition phase there may be objects with the same key (path) with or without the #INT
suffix; in these cases the object without the suffix should be preferred.
Metadata
Summary or detailed metadata are available from the facility metadata web service at:
https://service.earthscope.org/fdsnws/station/1
Full granularity metadata is commonly accessed in StationXML format with summaries commonly accessed in GeoCSV and text formats.
Data license and citation
When using these data please cite both the network owners, identified by network code, and the NSF SAGE facility operated by EarthScope.
Network operators and owners specify the license for their data, and an appropriate citation, in their registration information which is available at: https://fdsn.org/networks/. When no license has been declared by the owner, the facility distributes the data with a license of CC-BY-4.0.
The facility operation can be cited using the instructions for citing SAGE here: https://www.earthscope.org/how-to-cite/
Getting started
AWS CLI
> aws s3 ls --no-sign-request s3://earthscope-geophysical-data/
PRE miniseed/
2025-05-28 16:29:35 1024 README
>
Python with Boto3 and ObsPy
Boto3 is the AWS SDK for Python supporting access to S3 and ObsPy is a common Python framework for working with seismological data.
import boto3
from obspy import read
import io
# Initialize S3 client
s3 = boto3.client('s3')
# Define bucket and key
bucket_name = 'earthscope-geophysical-data'
object_key = 'miniseed/TA/2004/365/A04A.TA.2004.365#2'
# Download object to memory
response = s3.get_object(Bucket=bucket_name, Key=object_key)
data_stream = io.BytesIO(response['Body'].read())
# Parse with ObsPy
st = read(data_stream)
# Print the ObsPy Streams
print(st)