2.2. Specifying Waveform Archives

This tutorial provides instructions on how to direct QuakeMigrate to a local waveform archive and how to specify its structure. QuakeMigrate can handle any regularly structured waveform archive. Additional requirements can be handled on request - contact us at quakemigrate.developers@gmail.com or submit an Issue on our GitHub.

2.2.1. The Archive class

The Archive class provides methods for querying a waveform archive on a local system. It is capable of handling any regular archive structure, as well as any data file format that is compatible with ObsPy. Waveform data and an overview of the data availability (see Rejection criteria) are returned by a query to the archive.

It requires two pieces of information on instantiation:

archive_path: the path to seismic data archive
stations: a DataFrame containing station information. There is one required (case-sensitive) column header - “Name”.

All other parameters can either be provided as arguments on instantiation, or set once the Archive has been instantiated (see the section on specifying the archive structure below for an example).

Here we create a new instance of Archive.

from quakemigrate.io import Archive, read_stations


# --- Read in station file ---
stations = read_stations(station_file)

# --- Create new Archive and set path structure ---
archive = Archive(archive_path=data_in, stations=stations)

2.2.2. Specifying the archive structure

Once the Archive object has been instantiated, it is necessary to specify the structure of the archive. There are some standard formats, which can be accessed through the path_structure() method, including SeisComp3 and the standard structure used by SeisUK. These map to a formattable string used when querying the waveform archive:

archive.path_structure(archive_format="SeisComp3")

It is also possible to override with a custom archive structure:

archive.format = "{year}/{jday:03d}/{station}_{year}_{jday:03d}_{channels}.*"

The full list of keyword arguments that are passed into this formattable string when the archive is queried is:

year: UTCDateTime.year for the time period of the query
month: UTCDateTime.month for the time period of the query
day: UTCDateTime.day for the time period of the query
jday: UTCDateTime.julday for the time period of the query
station: the station name (replaced with "*" if reading all)
dtime: UTCDateTime for the time period of the query

The inclusion of dtime allows for incredible flexibility, with most of the other arguments just providing shorthand.

2.2.3. Resampling waveforms

It is not uncommon for a data archive to contain stations with differing sampling rates. QuakeMigrate, however, performs the core migration and stacking routine at a single, unified sampling rate. As such, we have bundled methods for accomplishing this automatically, resampling the waveform data to the specified sampling rate as it is read in. These routines aim to minimally alter the values of the waveforms by retaining as much of the original data as possible. Downsampling from 100 Hz to 50 Hz, for example, is accomplished by decimating the waveforms by a factor of two—skipping every other sample. If the unified sampling rate is not an integer divisor of the input waveform sampling rate, there is (limited) scope to linearly interpolate the waveform data to a sampling rate that does divide into the unified sampling rate an integer number of times, then decimate down.

Resampling can be toggled on with archive.resample = True, and a single factor by which to linearly interpolate data when resampling with archive.upfactor = 2. We hope to de-restrict this in the future to allow for automatic identification of a suitable upfactor (within reason).

2.2.4. Instrument response

While it is not necessary to remove the instrument response for the core migration and stacking routine—the default STA/LTA onset function implicitly handles this—if the user wishes to make use of the local magnitude calculation module, they must provide an inventory of instrument response functions. The quakemigrate.io.read_response_inv() function is a light wrapper for the ObsPy read_inventory() function. See their documentation for details of compatible formats.

In addition to the inventory of instrument response functions, the user can also set the water level, a pre-filter, and choose to remove the full response.

2.2.5. Rejection criteria

We currently impose fairly strict criteria on the data to be used in QuakeMigrate, which are detailed below.

2.2.5.1. Gap tolerance

It is possible to allow QuakeMigrate to use gappy data. We do not recommend using this without first assessing the waveform data and understanding the common causes of data gaps. This is currently set by toggling the allow_gaps parameter of the quakemigrate.signal.onsets.STALTAOnset object to True.

This also applies to data missing at the start/end of a timestep.

2.2.5.2. Flatlines

Some archives will choose to fill any gaps in their waveform data with flatline values. If, for a given timestep, the data all have the same value, they are rejected.

2.2.5.3. Overlaps

If there is overlapping waveform data for a particular station component, it is not used.