3.3. quakemigrate.io¶
The quakemigrate.io module handles the various input/output operations
performed by QuakeMigrate. This includes:
- Reading waveform data - The submodule data.py can handle any waveform data archives with regular directory structures.
- Writing results - The submodule quakeio.py provides a suite of functions to output QuakeMigrate results in the QuakeMigrate format.
- Parse QuakeMigrate results into the ObsPy Catalog structure.
- Various parsers to input files for different pieces of software. Feel free to contribute more!
| copyright: | 2020, QuakeMigrate developers. |
|---|---|
| license: | GNU General Public License, Version 3 (https://www.gnu.org/licenses/gpl-3.0.html) |
3.3.1. quakemigrate.io.amplitudes¶
Module to handle input/output of .amps files.
| copyright: | 2020, QuakeMigrate developers. |
|---|---|
| license: | GNU General Public License, Version 3 (https://www.gnu.org/licenses/gpl-3.0.html) |
-
quakemigrate.io.amplitudes.write_amplitudes(run, amplitudes, event)[source]¶ Write amplitude results to a new .amps file. This includes amplitude measurements, and the magnitude estimates derived from them (with station correction terms appied, if provided).
Parameters: - run (
Runobject) – Light class encapsulating i/o path information for a given run. - amplitudes (pandas.DataFrame object) –
P- and S-wave amplitude measurements for each component of each station in the station file, and individual local magnitude estimates derived from them. Columns = [“epi_dist”, “z_dist”, “P_amp”, “P_freq”, “P_time”,
”S_amp”, “S_freq”, “S_time”, “Noise_amp”, “is_picked”, “ML”, “ML_Err”]Index = Trace ID (see obspy.Trace object property ‘id’)
- event (
Eventobject) – Light class encapsulating signal, onset, and location information for a given event.
- run (
3.3.2. quakemigrate.io.availability¶
Module to handle input/output of StationAvailability.csv files.
| copyright: | 2020, QuakeMigrate developers. |
|---|---|
| license: | GNU General Public License, Version 3 (https://www.gnu.org/licenses/gpl-3.0.html) |
-
quakemigrate.io.availability.read_availability(run, starttime, endtime)[source]¶ Read in station availability data to a pandas.DataFrame from csv files split by Julian day.
Parameters: - run (
Runobject) – Light class encapsulating i/o path information for a given run. - starttime (obspy.UTCDateTime object) – Timestamp from which to read the station availability.
- endtime (obspy.UTCDateTime object) – Timestamp up to which to read the station availability.
Returns: availability – Details the availability of each station for each timestep of detect.
Return type: pandas.DataFrame object
- run (
-
quakemigrate.io.availability.write_availability(run, availability)[source]¶ Write out csv files (split by Julian day) containing station availability data.
Parameters: - run (
Runobject) – Light class encapsulating i/o path information for a given run. - availability (pandas.DataFrame object) – Details the availability of each station for each timestep of detect.
- run (
3.3.3. quakemigrate.io.core¶
Module to handle input/output for QuakeMigrate.
| copyright: | 2020, QuakeMigrate developers. |
|---|---|
| license: | GNU General Public License, Version 3 (https://www.gnu.org/licenses/gpl-3.0.html) |
-
class
quakemigrate.io.core.Run(path, name, subname='', stage=None, loglevel='info')[source]¶ Bases:
objectLight class to encapsulate i/o path information for a given run.
Parameters: - stage (str) – Specifies run stage of QuakeMigrate (“detect”, “trigger”, or “locate”).
- path (str) – Points to the top level directory containing all input files, under which the specific run directory will be created.
- name (str) – Name of the current QuakeMigrate run.
- subname (str, optional) – Optional name of a sub-run - useful when testing different trigger parameters, for example.
-
path¶ Points to the top level directory containing all input files, under which the specific run directory will be created.
Type: pathlib.Path object
-
name¶ Name of the current QuakeMigrate run.
Type: str
-
run_path¶ Points to the run directory into which files will be written.
Type: pathlib.Path object
-
subname¶ Optional name of a sub-run - useful when testing different trigger parameters, for example.
Type: str
-
stage¶ Track which stage of QuakeMigrate is being run.
Type: {“detect”, “trigger”, “locate”}, optional
-
loglevel¶ Set the logging level. (Default “info”)
Type: {“info”, “debug”}, optional
-
logger(log)[source] Configures the logging feature.
Parameters: log (bool) – Toggle for logging. If True, will output to stdout and generate a log file.
-
name Get the run name as a formatted string.
-
quakemigrate.io.core.read_lut(lut_file)[source]¶ Read the contents of a pickle file and restore state of the lookup table object.
Parameters: lut_file (str) – Path to pickle file to load. Returns: lut – Lookup table populated with grid specification and traveltimes. Return type: LUTobject
-
quakemigrate.io.core.read_response_inv(response_file, sac_pz_format=False)[source]¶ Reads response information from file, returning it as a obspy.Inventory object.
Parameters: - response_file (str) – Path to response file. Please see the obspy.read_inventory() documentation for a full list of supported file formats. This includes a dataless.seed volume, a concatenated series of RESP files or a stationXML file.
- sac_pz_format (bool, optional) – Toggle to indicate that response information is being provided in SAC Pole-Zero files. NOTE: not yet supported.
Returns: response_inv – ObsPy response inventory.
Return type: obspy.Inventory object
Raises: NotImplementedError– If the user selects sac_pz_format.TypeError– If the user provides a response file that is not readable by ObsPy.
-
quakemigrate.io.core.read_stations(station_file, **kwargs)[source]¶ Reads station information from file.
Parameters: - station_file (str) –
Path to station file. File format (header line is REQUIRED, case sensitive, any order):
Latitude, Longitude, Elevation (units of metres), Name - kwargs (dict) – Passthrough for pandas.read_csv kwargs.
Returns: stn_data – Columns: “Latitude”, “Longitude”, “Elevation”, “Name”
Return type: pandas.DataFrame object
Raises: StationFileHeaderException– Raised if the input file is missing required entries in the header.- station_file (str) –
-
quakemigrate.io.core.read_vmodel(vmodel_file, **kwargs)[source]¶ Reads velocity model information from file.
Parameters: - vmodel_file (str) – Path to velocity model file. File format: (header line is REQUIRED, case sensitive, any order): Depth (units of metres), Vp, Vs (units of metres per second)
- kwargs (dict) – Passthrough for pandas.read_csv kwargs.
Returns: vmodel_data – Columns: “Depth”, “Vp”, “Vs”
Return type: pandas.DataFrame object
Raises: VelocityModelFileHeaderException– Raised if the input file is missing required entries in the header.
3.3.4. quakemigrate.io.cut_waveforms¶
Module to handle input/output of cut waveforms.
| copyright: | 2020, QuakeMigrate developers. |
|---|---|
| license: | GNU General Public License, Version 3 (https://www.gnu.org/licenses/gpl-3.0.html) |
-
quakemigrate.io.cut_waveforms.write_cut_waveforms(run, event, file_format, pre_cut=0.0, post_cut=0.0)[source]¶ Output raw cut waveform data as a waveform file – defaults to mSEED.
Parameters: - run (
Runobject) – Light class encapsulating i/o path information for a given run. - event (
Eventobject) – Light class encapsulating signal, onset, and location information for a given event. - file_format (str, optional) – File format to write waveform data to. Options are all file formats supported by obspy, including: “MSEED” (default), “SAC”, “SEGY”, “GSE2”
- pre_cut (float, optional) – Specify how long before the event origin time to cut the waveform data from
- post_cut (float, optional) – Specify how long after the event origin time to cut the waveform data to
- run (
3.3.5. quakemigrate.io.data¶
Module for processing waveform files stored in a data archive.
| copyright: | 2020, QuakeMigrate developers. |
|---|---|
| license: | GNU General Public License, Version 3 (https://www.gnu.org/licenses/gpl-3.0.html) |
-
class
quakemigrate.io.data.Archive(archive_path, stations, archive_format=None, **kwargs)[source]¶ Bases:
objectThe Archive class handles the reading of archived waveform data.
It is capable of handling any regular archive structure. Requests to read waveform data are served up as a quakemigrate.data.WaveformData object. Data will be checked for availability within the requested time period, and optionally resampled to meet a unified sampling rate. The raw data read from the archive will also be retained.
If provided, a response inventory provided for the archive will be stored with the waveform data for response removal, if needed.
Parameters: - archive_path (str) – Location of seismic data archive: e.g.: ./DATA_ARCHIVE.
- stations (pandas.DataFrame object) – Station information. Columns [“Latitude”, “Longitude”, “Elevation”, “Name”]
- archive_format (str, optional) – Sets path type for different archive formats.
- kwargs (**dict) – See Archive Attributes for details.
-
archive_path¶ Location of seismic data archive: e.g.: ./DATA_ARCHIVE.
Type: pathlib.Path object
-
stations¶ Series object containing station names.
Type: pandas.Series object
-
format¶ File naming format of data archive.
Type: str
-
read_all_stations¶ If True, read all stations in archive for that time period. Else, only read specified stations.
Type: bool, optional
-
resample¶ If true, perform resampling of data which cannot be decimated directly to the desired sampling rate.
Type: bool, optional
-
response_inv¶ ObsPy response inventory for this waveform archive, containing response information for each channel of each station of each network.
Type: obspy.Inventory object, optional
-
upfactor¶ Factor by which to upsample the data to enable it to be decimated to the desired sampling rate, e.g. 40Hz -> 50Hz requires upfactor = 5.
Type: int, optional
-
path_structure(path_type="YEAR/JD/STATION")[source]¶ Set the file naming format of the data archive.
-
read_waveform_data(starttime, endtime, sampling_rate)[source]¶ Read in all waveform data between two times, decimate / resample if required to reach desired sampling rate. Return all raw data as an obspy Stream object and processed data for specified stations as an array for use by QuakeScan to calculate onset functions for migration.
-
path_structure(archive_format='YEAR/JD/STATION', channels='*')[source] Define the path structure of the data archive.
Parameters: - archive_format (str, optional) – Sets path type for different archive formats.
- channels (str, optional) – Channel codes to include. E.g. channels=”[B,H]H*”. (Default “*”)
Raises: ArchivePathStructureError– If the archive_format specified by the user is not a valid option.
-
read_waveform_data(starttime, endtime, sampling_rate, pre_pad=0.0, post_pad=0.0)[source] Read in the waveform data for all stations in the archive between two times and return station availability of the stations specified in the station file during this period. Decimate / resample (optional) this data if required to reach desired sampling rate.
Output both processed data for stations in station file and all raw data in an obspy Stream object.
By default, data with mismatched sampling rates will only be decimated. If necessary, and if the user specifies resample = True and an upfactor to upsample by upfactor = int, data can also be upsampled and then, if necessary, subsequently decimated to achieve the desired sampling rate.
For example, for raw input data sampled at a mix of 40, 50 and 100 Hz, to achieve a unified sampling rate of 50 Hz, the user would have to specify an upfactor of 5; 40 Hz x 5 = 200 Hz, which can then be decimated to 50 Hz.
NOTE: data will be detrended and a cosine taper applied before decimation, in order to avoid edge effects when applying the lowpass filter. Otherwise, data for migration will be added tp data.signal with no processing applied.
Supports all formats currently supported by ObsPy, including: “MSEED” (default), “SAC”, “SEGY”, “GSE2” .
Parameters: - starttime (obspy.UTCDateTime object, optional) – Timestamp from which to read waveform data.
- endtime (obspy.UTCDateTime object, optional) – Timestamp up to which to read waveform data.
- sampling_rate (int) – Desired sampling rate for data to be added to signal. This will be achieved by resampling the raw waveform data. By default, only decimation will be applied, but data can also be upsampled if specified by the user when creating the Archive object.
- pre_pad (float, optional) – Additional pre pad of data to cut based on user-defined pre_cut parameter. Defaults to none: pre_pad calculated in QuakeScan will be used (included in starttime).
- post_pad (float, optional) – Additional post pad of data to cut based on user-defined post_cut parameter. Defaults to none: post_pad calculated in QuakeScan will be used (included in endtime).
Returns: data – Object containing the archived data that satisfies the query.
Return type: WaveformDataobject
-
class
quakemigrate.io.data.WaveformData(starttime, endtime, sampling_rate, stations=None, response_inv=None, read_all_stations=False, pre_pad=0.0, post_pad=0.0)[source]¶ Bases:
objectThe WaveformData class encapsulates the waveform data returned by an` Archive query.
This includes the waveform data which has been pre-processed to a unified sampling rate, and checked for gaps, ready for use to calculate onset functions.
Parameters: - starttime (obspy.UTCDateTime object) – Timestamp of first sample of waveform data.
- endtime (obspy.UTCDateTime object) – Timestamp of last sample of waveform data.
- sampling_rate (int) – Desired sampling rate of signal data.
- stations (pandas.Series object, optional) – Series object containing station names.
- read_all_stations (bool, optional) – If True, raw_waveforms contain all stations in archive for that time period. Else, only selected stations will be included.
- response_inv (obspy.Inventory object, optional) – ObsPy response inventory for this waveform archive, containing response information for each channel of each station of each network.
- pre_pad (float, optional) – Additional pre pad of data cut based on user-defined pre_cut parameter.
- post_pad (float, optional) – Additional post pad of data cut based on user-defined post_cut parameter.
-
starttime¶ Timestamp of first sample of waveform data.
Type: obspy.UTCDateTime object
-
endtime¶ Timestamp of last sample of waveform data.
Type: obspy.UTCDateTime object
-
sampling_rate¶ Sampling rate of signal data.
Type: int
-
stations¶ Series object containing station names.
Type: pandas.Series object
-
read_all_stations¶ If True, raw_waveforms contain all stations in archive for that time period. Else, only selected stations will be included.
Type: bool
-
raw_waveforms¶ Raw seismic data found and read in from the archive within the specified time period. This may be for all stations in the archive, or only those specified by the user. See read_all_stations.
Type: obspy.Stream object
-
pre_pad¶ Additional pre pad of data cut based on user-defined pre_cut parameter.
Type: float
-
post_pad¶ Additional post pad of data cut based on user-defined post_cut parameter.
Type: float
-
signal¶ 3-component seismic data at the desired sampling rate; only for desired stations, which have continuous data on all 3 components throughout the desired time period and where (if necessary) the data could be successfully resampled to the desired sampling rate.
Type: numpy.ndarray, shape(3, nstations, nsamples)
-
availability¶ Array containing 0s (no data) or 1s (data), corresponding to whether data for each station met the requirements outlined in signal
Type: np.ndarray of ints, shape(nstations)
-
filtered_signal¶ Filtered data originally from signal.
Type: numpy.ndarray, shape(3, nstations, nsamples)
-
add_stream(stream, resample, upfactor)[source]¶ Function to add data supplied in the form of an obspy.Stream object.
-
get_wa_waveform(trace, **response_removal_params)[source]¶ Calculate the Wood-Anderson corrected waveform for a obspy.Trace object.
-
times()[source]¶ Utility function to generate the corresponding timestamps for the waveform and coalescence data.
Raises: NotImplementedError– If the user attempts to use the get_real_waveform() method.-
add_stream(stream, resample, upfactor)[source] Add signal data supplied in an obspy.Stream object. Perform resampling if necessary (decimation and/or upsampling), and determine availability of selected stations.
- stream : obspy.Stream object
- Contains list of obspy.Trace objects containing the waveform data to add.
- resample : bool, optional
- If true, perform resampling of data which cannot be decimated directly to the desired sampling rate.
- upfactor : int, optional
- Factor by which to upsample the data to enable it to be decimated to the desired sampling rate, e.g. 40Hz -> 50Hz requires upfactor = 5.
-
get_wa_waveform(tr, water_level, pre_filt, remove_full_response=False, velocity=False)[source] Calculate simulated Wood Anderson displacement waveform for a Trace.
Parameters: - tr (obspy.Trace object) – Trace containing the waveform to be corrected to a Wood-Anderson response
- water_level (float) – Water-level to be used in the instrument correction.
- pre_filt (tuple of floats, or None) – Filter corners describing filter to be applied to the trace before deconvolution. E.g. (0.05, 0.06, 30, 35) (in Hz)
- remove_full_response (bool, optional) – Remove all response stages, inc. FIR (st.remove_response()), not just poles-and-zero response stage. Default: False.
- velocity (bool, optional) – Output velocity waveform, instead of displacement. Default: False.
Returns: tr – Trace corrected to Wood-Anderson response.
Return type: obspy.Trace object
Raises: AttributeError– If no response inventory has been supplied.ResponseNotFoundError– If the response information for a trace can’t be found in the supplied response inventory.ResponseRemovalError– If the deconvolution of the instrument response and simulation of the Wood-Anderson response is unsuccessful.NotImplementedError– If the user selects velocity=True.
-
sample_size¶ s).
Type: Get the size of a sample (units
-
times(**kwargs)[source] Utility function to generate timestamps between data.starttime and data.endtime, with a sample size of data.sample_size
Returns: times – Timestamps for the timeseries data. Return type: numpy.ndarray, shape(nsamples)
3.3.6. quakemigrate.io.scanmseed¶
Module to handle input/output of .scanmseed files.
| copyright: | 2020, QuakeMigrate developers. |
|---|---|
| license: | GNU General Public License, Version 3 (https://www.gnu.org/licenses/gpl-3.0.html) |
-
class
quakemigrate.io.scanmseed.ScanmSEED(run, continuous_write, sampling_rate)[source]¶ Bases:
objectLight class to encapsulate the data output by the detect stage of QuakeMigrate. This data is stored in an obspy.Stream object with the channels: [“COA”, “COA_N”, “X”, “Y”, “Z”].
Parameters: - run (
Runobject) – Light class encapsulating i/o path information for a given run. - continuous_write (bool) – Option to continuously write the .scanmseed file output by detect() at the end of every time step. Default behaviour is to write in day chunks where possible.
- sampling_rate (int) – Desired sampling rate of input data; sampling rate at which to compute the coalescence function. Default: 50 Hz.
-
stream¶ Output of detect() stored in obspy.Stream object. The values have been multiplied by a factor to make use of more efficient compression. Channels: [“COA”, “COA_N”, “X”, “Y”, “Z”]
Type: obspy.Stream object
-
written¶ Tracker for whether the data appended has been written recently.
Type: bool
-
append(times, max_coa, max_coa_n, coord, map4d=None)[source]¶ Append the output of QuakeScan._compute() to the coalescence stream.
-
empty(starttime, timestep, i, msg)[source]¶ Create an set of empty arrays for a given timestep and append to the coalescence stream.
-
append(starttime, max_coa, max_coa_n, coord, ucf)[source] Append latest timestep of detect() output to obspy.Stream object. Multiply channels [“COA”, “COA_N”, “X”, “Y”, “Z”] by factors of [“1e5”, “1e5”, “1e6”, “1e6”, “1e3”] respectively, round and convert to int32 as this dramatically reduces memory usage, and allows the coastream data to be saved in mSEED format with STEIM2 compression. The multiplication factor is removed when the data is read back in.
Parameters: - starttime (obspy.UTCDateTime object) – Timestamp of first sample of coalescence data.
- max_coa (numpy.ndarray of floats, shape(nsamples)) – Coalescence value through time.
- max_coa_n (numpy.ndarray of floats, shape(nsamples)) – Normalised coalescence value through time.
- coord (numpy.ndarray of floats, shape(nsamples)) – Location of maximum coalescence through time in input projection space.
- ucf (float) – A conversion factor based on the lookup table grid projection. Used to ensure the same level of precision (millimetre) is retained during compression, irrespective of the units of the grid projection.
-
empty(starttime, timestep, i, msg, ucf)[source] Create an empty set of arrays to write to .scanmseed; used where there is no data available to run _compute().
Parameters: - starttime (obspy.UTCDateTime object) – Timestamp of first sample in the given timestep.
- timestep (float) – Length (in seconds) of timestep used in detect().
- i (int) – The ith timestep of the continuous compute.
- msg (str) – Message to output to log giving details as to why this timestep is empty.
- ucf (float) – A conversion factor based on the lookup table grid projection. Used to ensure the same level of precision (millimetre) is retained during compression, irrespective of the units of the grid projection.
-
write(write_start=None, write_end=None)[source] Write a new .scanmseed file from an obspy.Stream object containing the data output from detect(). Note: values have been multiplied by a power of ten, rounded and converted to an int32 array so the data can be saved as mSEED with STEIM2 compression. This multiplication factor is removed when the data is read back in with read_scanmseed().
Parameters: - write_start (obspy.UTCDateTime object, optional) – Timestamp from which to write the coalescence stream to file.
- write_end (obspy.UTCDateTime object, optional) – Timestamp up to which to write the coalescence stream to file.
- run (
-
quakemigrate.io.scanmseed.read_scanmseed(run, starttime, endtime, pad, ucf)[source]¶ Read .scanmseed files between two time stamps. Files are labelled by year and Julian day.
Parameters: - run (
Runobject) – Light class encapsulating i/o path information for a given run. - starttime (obspy.UTCDateTime object) – Timestamp from which to read the coalescence stream.
- endtime (obspy.UTCDateTime object) – Timestamp up to which to read the coalescence stream.
- pad (float) – Read in “pad” seconds of additional data on either end.
- ucf (float) – A conversion factor based on the lookup table grid projection. Used to ensure the same level of precision (millimetre) is retained during compression, irrespective of the units of the grid projection.
Returns: data (pandas.DataFrame object) – Data output by detect() – decimated scan. Columns: [“DT”, “COA”, “COA_N”, “X”, “Y”, “Z”] - X/Y/Z as lon/lat/m
stats (obspy.trace.Stats object) – Container for additional header information for coalescence trace. Contains keys: network, station, channel, starttime, endtime,
sampling_rate, delta, npts, calib, _format, mseed
- run (
3.3.7. quakemigrate.io.triggered_events¶
Module to handle input/output of TriggeredEvents.csv files.
| copyright: | 2020, QuakeMigrate developers. |
|---|---|
| license: | GNU General Public License, Version 3 (https://www.gnu.org/licenses/gpl-3.0.html) |
-
quakemigrate.io.triggered_events.read_triggered_events(run, **kwargs)[source]¶ Read triggered events from .csv file.
Parameters: - run (
Runobject) – Light class encapsulating i/o path information for a given run. - starttime (obspy.UTCDateTime object, optional) – Timestamp from which to include events in the locate scan.
- endtime (obspy.UTCDateTime object, optional) – Timestamp up to which to include events in the locate scan.
- trigger_file (str, optional) – File containing triggered events to be located.
Returns: events – Triggered events information. Columns: [“EventID”, “CoaTime”, “TRIG_COA”, “COA_X”, “COA_Y”, “COA_Z”, “COA”, “COA_NORM”].
Return type: pandas.DataFrame object
- run (
-
quakemigrate.io.triggered_events.write_triggered_events(run, events, starttime)[source]¶ Write triggered events to a .csv file.
Parameters: - run (
Runobject) – Light class encapsulating i/o path information for a given run. - events (pandas.DataFrame object) – Triggered events information. Columns: [“EventID”, “CoaTime”, “TRIG_COA”, “COA_X”, “COA_Y”, “COA_Z”, “COA”, “COA_NORM”].
- starttime (obspy.UTCDateTime object) – Timestamp from which events have been triggered.
- run (