2.2. Specifying Waveform Archives
This tutorial provides instructions on how to direct QuakeMigrate to a local waveform archive and how to specify its structure. QuakeMigrate can handle any regularly structured waveform archive. Additional requirements can be handled on request - contact us at quakemigrate.developers@gmail.com or submit an Issue on our GitHub.
2.2.1. The Archive class
The Archive class provides methods for querying a waveform archive on a local system. It is capable of handling any regular archive structure, as well as any data file format that is compatible with ObsPy. Waveform data and an overview of the data availability (see Rejection criteria) are returned by a query to the archive.
It requires two pieces of information on instantiation:
archive_path: the path to seismic data archivestations: a DataFrame containing station information. There is one required (case-sensitive) column header - “Name”.
All other parameters can either be provided as arguments on instantiation, or set once the Archive has been instantiated (see the section on specifying the archive structure below for an example).
Here we create a new instance of Archive.
from quakemigrate.io import Archive, read_stations
# --- Read in station file ---
stations = read_stations(station_file)
# --- Create new Archive and set path structure ---
archive = Archive(archive_path=data_in, stations=stations)
2.2.2. Specifying the archive structure
Once the Archive object has been instantiated, it is necessary to specify the structure of the archive. There are some standard formats, which can be accessed through the path_structure() method, including SeisComp3 and the standard structure used by SeisUK. These map to a formattable string used when querying the waveform archive:
archive.path_structure(archive_format="SeisComp3")
It is also possible to override with a custom archive structure:
archive.format = "{year}/{jday:03d}/{station}_{year}_{jday:03d}_{channels}.*"
The full list of keyword arguments that are passed into this formattable string when the archive is queried is:
year:UTCDateTime.yearfor the time period of the querymonth:UTCDateTime.monthfor the time period of the queryday:UTCDateTime.dayfor the time period of the queryjday:UTCDateTime.juldayfor the time period of the querystation: the station name (replaced with"*"if reading all)dtime:UTCDateTimefor the time period of the query
The inclusion of dtime allows for incredible flexibility, with most of the other arguments just providing shorthand.
2.2.3. Resampling waveforms
It is not uncommon for a data archive to contain stations with differing sampling rates. QuakeMigrate, however, performs the core migration and stacking routine at a single, unified sampling rate. As such, we have bundled methods for accomplishing this automatically, resampling the waveform data to the specified sampling rate as it is read in. These routines aim to minimally alter the values of the waveforms by retaining as much of the original data as possible. Downsampling from 100 Hz to 50 Hz, for example, is accomplished by decimating the waveforms by a factor of two—skipping every other sample. If the unified sampling rate is not an integer divisor of the input waveform sampling rate, there is (limited) scope to linearly interpolate the waveform data to a sampling rate that does divide into the unified sampling rate an integer number of times, then decimate down.
Resampling can be toggled on with archive.resample = True, and a single factor by which to linearly interpolate data when resampling with archive.upfactor = 2. We hope to de-restrict this in the future to allow for automatic identification of a suitable upfactor (within reason).
2.2.4. Instrument response
While it is not necessary to remove the instrument response for the core migration and stacking routine—the default STA/LTA onset function implicitly handles this—if the user wishes to make use of the local magnitude calculation module, they must provide an inventory of instrument response functions. The quakemigrate.io.read_response_inv() function is a light wrapper for the ObsPy read_inventory() function. See their documentation for details of compatible formats.
In addition to the inventory of instrument response functions, the user can also set the water level, a pre-filter, and choose to remove the full response.
2.2.5. Rejection criteria
We currently impose fairly strict criteria on the data to be used in QuakeMigrate, which are detailed below.
2.2.5.1. Gap tolerance
It is possible to allow QuakeMigrate to use gappy data. We do not recommend using this without first assessing the waveform data and understanding the common causes of data gaps. This is currently set by toggling the allow_gaps parameter of the quakemigrate.signal.onsets.STALTAOnset object to True.
This also applies to data missing at the start/end of a timestep.
2.2.5.2. Flatlines
Some archives will choose to fill any gaps in their waveform data with flatline values. If, for a given timestep, the data all have the same value, they are rejected.
2.2.5.3. Overlaps
If there is overlapping waveform data for a particular station component, it is not used.