Spaces:

AI4EPS
/

EQNet

Sleeping

App Files Files Community

zhuwq0 commited on Oct 31, 2023

Commit

0eb79a8

0 Parent(s):

init

Browse files

Files changed (30) hide show

Dockerfile +25 -0
LICENSE +21 -0
docs/README.md +144 -0
docs/data.mseed +0 -0
docs/example_batch_prediction.ipynb +211 -0
docs/example_fastapi.ipynb +0 -0
docs/example_gradio.ipynb +0 -0
docs/test_api.py +37 -0
env.yml +17 -0
mkdocs.yml +18 -0
model/190703-214543/checkpoint +3 -0
model/190703-214543/config.log +3 -0
model/190703-214543/loss.log +3 -0
model/190703-214543/model_95.ckpt.data-00000-of-00001 +3 -0
model/190703-214543/model_95.ckpt.index +3 -0
model/190703-214543/model_95.ckpt.meta +3 -0
phasenet/__init__.py +1 -0
phasenet/app.py +341 -0
phasenet/data_reader.py +1010 -0
phasenet/detect_peaks.py +207 -0
phasenet/model.py +489 -0
phasenet/postprocess.py +377 -0
phasenet/predict.py +262 -0
phasenet/slide_window.py +88 -0
phasenet/test_app.py +47 -0
phasenet/train.py +246 -0
phasenet/util.py +238 -0
phasenet/visulization.py +481 -0
requirements.txt +7 -0
setup.py +116 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,25 @@

+FROM tensorflow/tensorflow
+# Create the environment:
+# COPY env.yml /app
+# RUN conda env create --name cs329s --file=env.yml
+# Make RUN commands use the new environment:
+# SHELL ["conda", "run", "-n", "cs329s", "/bin/bash", "-c"]
+RUN pip install tqdm obspy pandas
+RUN pip install uvicorn fastapi
+WORKDIR /opt
+# Copy files
+COPY phasenet /opt/phasenet
+COPY model /opt/model
+# Expose API port
+EXPOSE 8000
+ENV PYTHONUNBUFFERED=1
+# Start API server
+#ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "cs329s", "uvicorn", "--app-dir", "phasenet", "app:app", "--reload", "--port", "8000", "--host", "0.0.0.0"]
+ENTRYPOINT ["uvicorn", "--app-dir", "phasenet", "app:app", "--reload", "--port", "7860", "--host", "0.0.0.0"]

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2021 Weiqiang Zhu
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

docs/README.md ADDED Viewed

	@@ -0,0 +1,144 @@

+# PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method
+[![](https://github.com/AI4EPS/PhaseNet/workflows/documentation/badge.svg)](https://ai4eps.github.io/PhaseNet)
+## 1.  Install [miniconda](https://docs.conda.io/en/latest/miniconda.html) and requirements
+- Download PhaseNet repository
+```bash
+git clone https://github.com/wayneweiqiang/PhaseNet.git
+cd PhaseNet
+```
+- Install to default environment
+```bash
+conda env update -f=env.yml -n base
+```
+- Install to "phasenet" virtual envirionment
+```bash
+conda env create -f env.yml
+conda activate phasenet
+```
+## 2. Pre-trained model
+Located in directory: **model/190703-214543**
+## 3. Related papers
+- Zhu, Weiqiang, and Gregory C. Beroza. "PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method." arXiv preprint arXiv:1803.03211 (2018).
+- Liu, Min, et al. "Rapid characterization of the July 2019 Ridgecrest, California, earthquake sequence from raw seismic data using machine‐learning phase picker." Geophysical Research Letters 47.4 (2020): e2019GL086189.
+- Park, Yongsoo, et al. "Machine‐learning‐based analysis of the Guy‐Greenbrier, Arkansas earthquakes: A tale of two sequences." Geophysical Research Letters 47.6 (2020): e2020GL087032.
+- Chai, Chengping, et al. "Using a deep neural network and transfer learning to bridge scales for seismic phase picking." Geophysical Research Letters 47.16 (2020): e2020GL088651.
+- Tan, Yen Joe, et al. "Machine‐Learning‐Based High‐Resolution Earthquake Catalog Reveals How Complex Fault Structures Were Activated during the 2016–2017 Central Italy Sequence." The Seismic Record 1.1 (2021): 11-19.
+## 4. Batch prediction
+See examples in the [notebook](https://github.com/wayneweiqiang/PhaseNet/blob/master/docs/example_batch_prediction.ipynb): [example_batch_prediction.ipynb](example_batch_prediction.ipynb)
+PhaseNet currently supports four data formats: mseed, sac, hdf5, and numpy. The test data can be downloaded here:
+```
+wget https://github.com/wayneweiqiang/PhaseNet/releases/download/test_data/test_data.zip
+unzip test_data.zip
+```
+- For mseed format:
+```
+python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed.csv --data_dir=test_data/mseed --format=mseed --amplitude --response_xml=test_data/stations.xml --batch_size=1 --sampling_rate=100 --plot_figure
+```
+```
+python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed2.csv --data_dir=test_data/mseed --format=mseed --amplitude --response_xml=test_data/stations.xml --batch_size=1 --sampling_rate=100 --plot_figure
+```
+- For sac format:
+```
+python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/sac.csv --data_dir=test_data/sac --format=sac --batch_size=1 --plot_figure
+```
+- For numpy format:
+```
+python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/npz.csv --data_dir=test_data/npz --format=numpy --plot_figure
+```
+- For hdf5 format:
+```
+python phasenet/predict.py --model=model/190703-214543 --hdf5_file=test_data/data.h5 --hdf5_group=data --format=hdf5 --plot_figure
+```
+- For a seismic array (used by [QuakeFlow](https://github.com/wayneweiqiang/QuakeFlow)):
+```
+python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed_array.csv --data_dir=test_data/mseed_array --stations=test_data/stations.json  --format=mseed_array --amplitude
+```
+Notes:
+1. The reason for using "--batch_size=1" is because the mseed or sac files usually are not the same length. If you want to use a larger batch size for a good prediction speed, you need to cut the data to the same length.
+2. Remove the "--plot_figure" argument for large datasets, because plotting can be very slow.
+Optional arguments:
+```
+usage: predict.py [-h] [--batch_size BATCH_SIZE] [--model_dir MODEL_DIR]
+                  [--data_dir DATA_DIR] [--data_list DATA_LIST]
+                  [--hdf5_file HDF5_FILE] [--hdf5_group HDF5_GROUP]
+                  [--result_dir RESULT_DIR] [--result_fname RESULT_FNAME]
+                  [--min_p_prob MIN_P_PROB] [--min_s_prob MIN_S_PROB]
+                  [--mpd MPD] [--amplitude] [--format FORMAT]
+                  [--s3_url S3_URL] [--stations STATIONS] [--plot_figure]
+                  [--save_prob]
+optional arguments:
+  -h, --help            show this help message and exit
+  --batch_size BATCH_SIZE
+                        batch size
+  --model_dir MODEL_DIR
+                        Checkpoint directory (default: None)
+  --data_dir DATA_DIR   Input file directory
+  --data_list DATA_LIST
+                        Input csv file
+  --hdf5_file HDF5_FILE
+                        Input hdf5 file
+  --hdf5_group HDF5_GROUP
+                        data group name in hdf5 file
+  --result_dir RESULT_DIR
+                        Output directory
+  --result_fname RESULT_FNAME
+                        Output file
+  --min_p_prob MIN_P_PROB
+                        Probability threshold for P pick
+  --min_s_prob MIN_S_PROB
+                        Probability threshold for S pick
+  --mpd MPD             Minimum peak distance
+  --amplitude           if return amplitude value
+  --format FORMAT       input format
+  --stations STATIONS   seismic station info
+  --plot_figure         If plot figure for test
+  --save_prob           If save result for test
+```
+- The output picks are saved to "results/picks.csv" on default
+|file_name        |begin_time             |station_id|phase_index|phase_time             |phase_score|phase_amp             |phase_type|
+|-----------------|-----------------------|----------|-----------|-----------------------|-----------|----------------------|----------|
+|2020-10-01T00:00*|2020-10-01T00:00:00.003|CI.BOM..HH|14734      |2020-10-01T00:02:27.343|0.708      |2.4998866231208325e-14|P         |
+|2020-10-01T00:00*|2020-10-01T00:00:00.003|CI.BOM..HH|15487      |2020-10-01T00:02:34.873|0.416      |2.4998866231208325e-14|S         |
+|2020-10-01T00:00*|2020-10-01T00:00:00.003|CI.COA..HH|319        |2020-10-01T00:00:03.193|0.762      |3.708662269972206e-14 |P         |
+Notes:
+1. The *phase_index* means which data point is the pick in the original sequence. So *phase_time* = *begin_time* + *phase_index* / *sampling rate*. The default *sampling_rate* is 100Hz
+## 5. QuakeFlow example
+A complete earthquake detection workflow can be found in the [QuakeFlow](https://wayneweiqiang.github.io/QuakeFlow/) project.
+## 6. Interactive example
+See details in the [notebook](https://github.com/wayneweiqiang/PhaseNet/blob/master/docs/example_gradio.ipynb): [example_interactive.ipynb](example_gradio.ipynb)
+## 7. Training
+- Download a small sample dataset:
+```bash
+wget https://github.com/wayneweiqiang/PhaseNet/releases/download/test_data/test_data.zip
+unzip test_data.zip
+```
+- Start training from the pre-trained model
+```
+python phasenet/train.py  --model_dir=model/190703-214543/ --train_dir=test_data/npz --train_list=test_data/npz.csv  --plot_figure --epochs=10 --batch_size=10
+```
+- Check results in the **log** folder

docs/data.mseed ADDED Viewed

Binary file (73.7 kB). View file

docs/example_batch_prediction.ipynb ADDED Viewed

	@@ -0,0 +1,211 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Batch Prediction\n",
+    "\n",
+    "## 1. Download demo data\n",
+    "\n",
+    "```\n",
+    "cd PhaseNet\n",
+    "wget https://github.com/wayneweiqiang/PhaseNet/releases/download/test_data/test_data.zip\n",
+    "unzip test_data.zip\n",
+    "```\n",
+    "\n",
+    "## 2. Run batch prediction \n",
+    "\n",
+    "PhaseNet currently supports four data formats: mseed, sac, hdf5, and numpy. \n",
+    "\n",
+    "- For mseed format:\n",
+    "```\n",
+    "python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed.csv --data_dir=test_data/mseed --format=mseed --plot_figure\n",
+    "```\n",
+    "\n",
+    "- For sac format:\n",
+    "```\n",
+    "python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/sac.csv --data_dir=test_data/sac --format=sac --plot_figure\n",
+    "```\n",
+    "\n",
+    "- For numpy format:\n",
+    "```\n",
+    "python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/npz.csv --data_dir=test_data/npz --format=numpy --plot_figure\n",
+    "```\n",
+    "\n",
+    "- For hdf5 format:\n",
+    "```\n",
+    "python phasenet/predict.py --model=model/190703-214543 --hdf5_file=test_data/data.h5 --hdf5_group=data --format=hdf5 --plot_figure\n",
+    "```\n",
+    "\n",
+    "- For a seismic array (used by [QuakeFlow](https://github.com/wayneweiqiang/QuakeFlow)):\n",
+    "```\n",
+    "python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed_array.csv --data_dir=test_data/mseed_array --stations=test_data/stations.json  --format=mseed_array --amplitude\n",
+    "```\n",
+    "```\n",
+    "python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed2.csv --data_dir=test_data/mseed --stations=test_data/stations.json  --format=mseed_array --amplitude\n",
+    "```\n",
+    "\n",
+    "Notes: \n",
+    "1. Remove the \"--plot_figure\" argument for large datasets, because plotting can be very slow.\n",
+    "\n",
+    "Optional arguments:\n",
+    "```\n",
+    "usage: predict.py [-h] [--batch_size BATCH_SIZE] [--model_dir MODEL_DIR]\n",
+    "                  [--data_dir DATA_DIR] [--data_list DATA_LIST]\n",
+    "                  [--hdf5_file HDF5_FILE] [--hdf5_group HDF5_GROUP]\n",
+    "                  [--result_dir RESULT_DIR] [--result_fname RESULT_FNAME]\n",
+    "                  [--min_p_prob MIN_P_PROB] [--min_s_prob MIN_S_PROB]\n",
+    "                  [--mpd MPD] [--amplitude] [--format FORMAT]\n",
+    "                  [--s3_url S3_URL] [--stations STATIONS] [--plot_figure]\n",
+    "                  [--save_prob]\n",
+    "\n",
+    "optional arguments:\n",
+    "  -h, --help            show this help message and exit\n",
+    "  --batch_size BATCH_SIZE\n",
+    "                        batch size\n",
+    "  --model_dir MODEL_DIR\n",
+    "                        Checkpoint directory (default: None)\n",
+    "  --data_dir DATA_DIR   Input file directory\n",
+    "  --data_list DATA_LIST\n",
+    "                        Input csv file\n",
+    "  --hdf5_file HDF5_FILE\n",
+    "                        Input hdf5 file\n",
+    "  --hdf5_group HDF5_GROUP\n",
+    "                        data group name in hdf5 file\n",
+    "  --result_dir RESULT_DIR\n",
+    "                        Output directory\n",
+    "  --result_fname RESULT_FNAME\n",
+    "                        Output file\n",
+    "  --min_p_prob MIN_P_PROB\n",
+    "                        Probability threshold for P pick\n",
+    "  --min_s_prob MIN_S_PROB\n",
+    "                        Probability threshold for S pick\n",
+    "  --mpd MPD             Minimum peak distance\n",
+    "  --amplitude           if return amplitude value\n",
+    "  --format FORMAT       input format\n",
+    "  --stations STATIONS   seismic station info\n",
+    "  --plot_figure         If plot figure for test\n",
+    "  --save_prob           If save result for test\n",
+    "```\n",
+    "\n",
+    "## 3. Output picks\n",
+    "- The output picks are saved to \"results/picks.csv\" on default\n",
+    "\n",
+    "|file_name        |begin_time             |station_id|phase_index|phase_time             |phase_score|phase_amp             |phase_type|\n",
+    "|-----------------|-----------------------|----------|-----------|-----------------------|-----------|----------------------|----------|\n",
+    "|2020-10-01T00:00*|2020-10-01T00:00:00.003|CI.BOM..HH|14734      |2020-10-01T00:02:27.343|0.708      |2.4998866231208325e-14|P         |\n",
+    "|2020-10-01T00:00*|2020-10-01T00:00:00.003|CI.BOM..HH|15487      |2020-10-01T00:02:34.873|0.416      |2.4998866231208325e-14|S         |\n",
+    "|2020-10-01T00:00*|2020-10-01T00:00:00.003|CI.COA..HH|319        |2020-10-01T00:00:03.193|0.762      |3.708662269972206e-14 |P         |\n",
+    "\n",
+    "Notes:\n",
+    "1. The *phase_index* means which data point is the pick in the original sequence. So *phase_time* = *begin_time* + *phase_index* / *sampling rate*. The default *sampling_rate* is 100Hz \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Read P/S picks\n",
+    "\n",
+    "PhaseNet currently outputs two format: **CSV** and **JSON**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import json\n",
+    "import os\n",
+    "PROJECT_ROOT = os.path.realpath(os.path.join(os.path.abspath(''), \"..\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "fname      NC.MCV..EH.0361339.npz\n",
+      "t0        1970-01-01T00:00:00.000\n",
+      "p_idx                [5999, 9015]\n",
+      "p_prob             [0.987, 0.981]\n",
+      "s_idx                [6181, 9205]\n",
+      "s_prob             [0.553, 0.873]\n",
+      "Name: 1, dtype: object\n",
+      "fname      NN.LHV..EH.0384064.npz\n",
+      "t0        1970-01-01T00:00:00.000\n",
+      "p_idx                          []\n",
+      "p_prob                         []\n",
+      "s_idx                          []\n",
+      "s_prob                         []\n",
+      "Name: 0, dtype: object\n"
+     ]
+    }
+   ],
+   "source": [
+    "picks_csv = pd.read_csv(os.path.join(PROJECT_ROOT, \"results/picks.csv\"), sep=\"\\t\")\n",
+    "picks_csv.loc[:, 'p_idx'] = picks_csv[\"p_idx\"].apply(lambda x: x.strip(\"[]\").split(\",\"))\n",
+    "picks_csv.loc[:, 'p_prob'] = picks_csv[\"p_prob\"].apply(lambda x: x.strip(\"[]\").split(\",\"))\n",
+    "picks_csv.loc[:, 's_idx'] = picks_csv[\"s_idx\"].apply(lambda x: x.strip(\"[]\").split(\",\"))\n",
+    "picks_csv.loc[:, 's_prob'] = picks_csv[\"s_prob\"].apply(lambda x: x.strip(\"[]\").split(\",\"))\n",
+    "print(picks_csv.iloc[1])\n",
+    "print(picks_csv.iloc[0])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'id': 'NC.MCV..EH.0361339.npz', 'timestamp': '1970-01-01T00:01:30.150', 'prob': 0.9811667799949646, 'type': 'p'}\n",
+      "{'id': 'NC.MCV..EH.0361339.npz', 'timestamp': '1970-01-01T00:00:59.990', 'prob': 0.9872905611991882, 'type': 'p'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "with open(os.path.join(PROJECT_ROOT, \"results/picks.json\")) as fp:\n",
+    "    picks_json = json.load(fp)  \n",
+    "print(picks_json[1])\n",
+    "print(picks_json[0])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.10.4 64-bit",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.4"
+  },
+  "orig_nbformat": 4,
+  "vscode": {
+   "interpreter": {
+    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

docs/example_fastapi.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

docs/example_gradio.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

docs/test_api.py ADDED Viewed

	@@ -0,0 +1,37 @@

+# %%
+from gradio_client import Client
+import obspy
+import numpy as np
+import json
+import pandas as pd
+# %%
+waveform = obspy.read()
+array = np.array([x.data for x in waveform]).T
+# pipeline = PreTrainedPipeline()
+inputs = array.tolist()
+inputs = json.dumps(inputs)
+# picks = pipeline(inputs)
+# print(picks)
+# %%
+client = Client("ai4eps/phasenet")
+output, file = client.predict(["test_test.mseed"])
+# %%
+with open(output, "r") as f:
+    picks = json.load(f)["data"]
+# %%
+picks = pd.read_csv(file)
+# %%
+job = client.submit(["test_test.mseed", "test_test.mseed"], api_name="/predict")  # This is not blocking
+print(job.status())
+# %%
+output, file = job.result()

env.yml ADDED Viewed

	@@ -0,0 +1,17 @@

+name: phasenet
+channels:
+  - defaults
+  - conda-forge
+dependencies:
+  - python
+  - numpy
+  - scipy
+  - matplotlib
+  - pandas
+  - scikit-learn
+  - tqdm
+  - obspy
+  - uvicorn
+  - fastapi
+  - tensorflow
+  - keras

mkdocs.yml ADDED Viewed

	@@ -0,0 +1,18 @@

+site_name: "PhaseNet"
+site_description: 'PhaseNet: a deep-neural-network-based seismic arrival-time picking method'
+site_author: 'Weiqiang Zhu'
+docs_dir: docs/
+repo_name: 'AI4EPS/PhaseNet'
+repo_url: 'https://github.com/ai4eps/PhaseNet'
+nav:
+    - Overview: README.md
+    - Interactive Example: example_gradio.ipynb
+    - Batch Prediction: example_batch_prediction.ipynb
+theme:
+  name: 'material'
+plugins:
+  - mkdocs-jupyter
+extra:
+  analytics:
+    provider: google
+    property: G-RZQ9LRPL0S

model/190703-214543/checkpoint ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1606ccb25e1533fa0398c5dbce7f3a45ac77f90b78b99f81a044294ba38a2c0c
+size 83

model/190703-214543/config.log ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed9dfa705053a5025facc9952c7da6abef19ec5f672d9e50386bf3f2d80294f2
+size 345

model/190703-214543/loss.log ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ccb6f19117497571e19bec5da6012ac7af91f1bd29e931ffd0b23c6b657bb401
+size 8101

model/190703-214543/model_95.ckpt.data-00000-of-00001 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9ee2c15dd78fb15de45a55ad64a446f1a0ced152ba4ac5c506d82b9194da85b4
+size 3226256

model/190703-214543/model_95.ckpt.index ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f96b553b76be4ebae9a455eaf8d83cfa8c0e110f06cfba958de2568e5b6b2780
+size 7223

model/190703-214543/model_95.ckpt.meta ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ebd154a5ba0721ba8bbb627ba61b556ee60660eb34bbcd1b1f50396b07cc4ed
+size 2172055

phasenet/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ __version__ = "0.1.0"

phasenet/app.py ADDED Viewed

	@@ -0,0 +1,341 @@

+import os
+from collections import defaultdict, namedtuple
+from datetime import datetime, timedelta
+from json import dumps
+from typing import Any, AnyStr, Dict, List, NamedTuple, Union, Optional
+import numpy as np
+import requests
+import tensorflow as tf
+from fastapi import FastAPI, WebSocket
+from kafka import KafkaProducer
+from pydantic import BaseModel
+from scipy.interpolate import interp1d
+from model import ModelConfig, UNet
+from postprocess import extract_picks
+tf.compat.v1.disable_eager_execution()
+tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
+PROJECT_ROOT = os.path.realpath(os.path.join(os.path.dirname(__file__), ".."))
+JSONObject = Dict[AnyStr, Any]
+JSONArray = List[Any]
+JSONStructure = Union[JSONArray, JSONObject]
+app = FastAPI()
+X_SHAPE = [3000, 1, 3]
+SAMPLING_RATE = 100
+# load model
+model = UNet(mode="pred")
+sess_config = tf.compat.v1.ConfigProto()
+sess_config.gpu_options.allow_growth = True
+sess = tf.compat.v1.Session(config=sess_config)
+saver = tf.compat.v1.train.Saver(tf.compat.v1.global_variables())
+init = tf.compat.v1.global_variables_initializer()
+sess.run(init)
+latest_check_point = tf.train.latest_checkpoint(f"{PROJECT_ROOT}/model/190703-214543")
+print(f"restoring model {latest_check_point}")
+saver.restore(sess, latest_check_point)
+# GAMMA API Endpoint
+GAMMA_API_URL = "http://gamma-api:8001"
+# GAMMA_API_URL = 'http://localhost:8001'
+# GAMMA_API_URL = "http://gamma.quakeflow.com"
+# GAMMA_API_URL = "http://127.0.0.1:8001"
+# Kafak producer
+use_kafka = False
+try:
+    print("Connecting to k8s kafka")
+    BROKER_URL = "quakeflow-kafka-headless:9092"
+    # BROKER_URL = "34.83.137.139:9094"
+    producer = KafkaProducer(
+        bootstrap_servers=[BROKER_URL],
+        key_serializer=lambda x: dumps(x).encode("utf-8"),
+        value_serializer=lambda x: dumps(x).encode("utf-8"),
+    )
+    use_kafka = True
+    print("k8s kafka connection success!")
+except BaseException:
+    print("k8s Kafka connection error")
+    try:
+        print("Connecting to local kafka")
+        producer = KafkaProducer(
+            bootstrap_servers=["localhost:9092"],
+            key_serializer=lambda x: dumps(x).encode("utf-8"),
+            value_serializer=lambda x: dumps(x).encode("utf-8"),
+        )
+        use_kafka = True
+        print("local kafka connection success!")
+    except BaseException:
+        print("local Kafka connection error")
+print(f"Kafka status: {use_kafka}")
+def normalize_batch(data, window=3000):
+    """
+    data: nsta, nt, nch
+    """
+    shift = window // 2
+    nsta, nt, nch = data.shape
+    # std in slide windows
+    data_pad = np.pad(data, ((0, 0), (window // 2, window // 2), (0, 0)), mode="reflect")
+    t = np.arange(0, nt, shift, dtype="int")
+    std = np.zeros([nsta, len(t) + 1, nch])
+    mean = np.zeros([nsta, len(t) + 1, nch])
+    for i in range(1, len(t)):
+        std[:, i, :] = np.std(data_pad[:, i * shift : i * shift + window, :], axis=1)
+        mean[:, i, :] = np.mean(data_pad[:, i * shift : i * shift + window, :], axis=1)
+    t = np.append(t, nt)
+    # std[:, -1, :] = np.std(data_pad[:, -window:, :], axis=1)
+    # mean[:, -1, :] = np.mean(data_pad[:, -window:, :], axis=1)
+    std[:, -1, :], mean[:, -1, :] = std[:, -2, :], mean[:, -2, :]
+    std[:, 0, :], mean[:, 0, :] = std[:, 1, :], mean[:, 1, :]
+    std[std == 0] = 1
+    # ## normalize data with interplated std
+    t_interp = np.arange(nt, dtype="int")
+    std_interp = interp1d(t, std, axis=1, kind="slinear")(t_interp)
+    mean_interp = interp1d(t, mean, axis=1, kind="slinear")(t_interp)
+    data = (data - mean_interp) / std_interp
+    return data
+def preprocess(data):
+    raw = data.copy()
+    data = normalize_batch(data)
+    if len(data.shape) == 3:
+        data = data[:, :, np.newaxis, :]
+        raw = raw[:, :, np.newaxis, :]
+    return data, raw
+def calc_timestamp(timestamp, sec):
+    timestamp = datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S.%f") + timedelta(seconds=sec)
+    return timestamp.strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3]
+def format_picks(picks, dt, amplitudes):
+    picks_ = []
+    for pick, amplitude in zip(picks, amplitudes):
+        for idxs, probs, amps in zip(pick.p_idx, pick.p_prob, amplitude.p_amp):
+            for idx, prob, amp in zip(idxs, probs, amps):
+                picks_.append(
+                    {
+                        "id": pick.fname,
+                        "timestamp": calc_timestamp(pick.t0, float(idx) * dt),
+                        "prob": prob,
+                        "amp": amp,
+                        "type": "p",
+                    }
+                )
+        for idxs, probs, amps in zip(pick.s_idx, pick.s_prob, amplitude.s_amp):
+            for idx, prob, amp in zip(idxs, probs, amps):
+                picks_.append(
+                    {
+                        "id": pick.fname,
+                        "timestamp": calc_timestamp(pick.t0, float(idx) * dt),
+                        "prob": prob,
+                        "amp": amp,
+                        "type": "s",
+                    }
+                )
+    return picks_
+def format_data(data):
+    # chn2idx = {"ENZ": {"E":0, "N":1, "Z":2},
+    #            "123": {"3":0, "2":1, "1":2},
+    #            "12Z": {"1":0, "2":1, "Z":2}}
+    chn2idx = {"E": 0, "N": 1, "Z": 2, "3": 0, "2": 1, "1": 2}
+    Data = NamedTuple("data", [("id", list), ("timestamp", list), ("vec", list), ("dt", float)])
+    # Group by station
+    chn_ = defaultdict(list)
+    t0_ = defaultdict(list)
+    vv_ = defaultdict(list)
+    for i in range(len(data.id)):
+        key = data.id[i][:-1]
+        chn_[key].append(data.id[i][-1])
+        t0_[key].append(datetime.strptime(data.timestamp[i], "%Y-%m-%dT%H:%M:%S.%f").timestamp() * SAMPLING_RATE)
+        vv_[key].append(np.array(data.vec[i]))
+    # Merge to Data tuple
+    id_ = []
+    timestamp_ = []
+    vec_ = []
+    for k in chn_:
+        id_.append(k)
+        min_t0 = min(t0_[k])
+        timestamp_.append(datetime.fromtimestamp(min_t0 / SAMPLING_RATE).strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3])
+        vec = np.zeros([X_SHAPE[0], X_SHAPE[-1]])
+        for i in range(len(chn_[k])):
+            # vec[int(t0_[k][i]-min_t0):len(vv_[k][i]), chn2idx[chn_[k][i]]] = vv_[k][i][int(t0_[k][i]-min_t0):X_SHAPE[0]] - np.mean(vv_[k][i])
+            shift = int(t0_[k][i] - min_t0)
+            vec[shift : len(vv_[k][i]) + shift, chn2idx[chn_[k][i]]] = vv_[k][i][: X_SHAPE[0] - shift] - np.mean(
+                vv_[k][i][: X_SHAPE[0] - shift]
+            )
+        vec_.append(vec.tolist())
+    return Data(id=id_, timestamp=timestamp_, vec=vec_, dt=1 / SAMPLING_RATE)
+    # return {"id": id_, "timestamp": timestamp_, "vec": vec_, "dt":1 / SAMPLING_RATE}
+def get_prediction(data, return_preds=False):
+    vec = np.array(data.vec)
+    vec, vec_raw = preprocess(vec)
+    feed = {model.X: vec, model.drop_rate: 0, model.is_training: False}
+    preds = sess.run(model.preds, feed_dict=feed)
+    picks = extract_picks(preds, station_ids=data.id, begin_times=data.timestamp, waveforms=vec_raw)
+    picks = [
+        {k: v for k, v in pick.items() if k in ["station_id", "phase_time", "phase_score", "phase_type", "dt"]}
+        for pick in picks
+    ]
+    if return_preds:
+        return picks, preds
+    return picks
+class Data(BaseModel):
+    # id: Union[List[str], str]
+    # timestamp: Union[List[str], str]
+    # vec: Union[List[List[List[float]]], List[List[float]]]
+    id: List[str]
+    timestamp: List[Union[str, float, datetime]]
+    vec: Union[List[List[List[float]]], List[List[float]]]
+    dt: Optional[float] = 0.01
+    ## gamma
+    stations: Optional[List[Dict[str, Union[float, str]]]] = None
+    config: Optional[Dict[str, Union[List[float], List[int], List[str], float, int, str]]] = None
+# @app.on_event("startup")
+# def set_default_executor():
+#     from concurrent.futures import ThreadPoolExecutor
+#     import asyncio
+#
+#     loop = asyncio.get_running_loop()
+#     loop.set_default_executor(
+#         ThreadPoolExecutor(max_workers=2)
+#     )
+@app.post("/predict")
+def predict(data: Data):
+    picks = get_prediction(data)
+    return picks
+@app.websocket("/ws")
+async def websocket_endpoint(websocket: WebSocket):
+    await websocket.accept()
+    while True:
+        data = await websocket.receive_json()
+        # data = json.loads(data)
+        data = Data(**data)
+        picks = get_prediction(data)
+        await websocket.send_json(picks)
+        print("PhaseNet Updating...")
+@app.post("/predict_prob")
+def predict(data: Data):
+    picks, preds = get_prediction(data, True)
+    return picks, preds.tolist()
+@app.post("/predict_phasenet2gamma")
+def predict(data: Data):
+    picks = get_prediction(data)
+    # if use_kafka:
+    #     print("Push picks to kafka...")
+    #     for pick in picks:
+    #         producer.send("phasenet_picks", key=pick["id"], value=pick)
+    try:
+        catalog = requests.post(
+            f"{GAMMA_API_URL}/predict", json={"picks": picks, "stations": data.stations, "config": data.config}
+        )
+        print(catalog.json()["catalog"])
+        return catalog.json()
+    except Exception as error:
+        print(error)
+    return {}
+@app.post("/predict_phasenet2gamma2ui")
+def predict(data: Data):
+    picks = get_prediction(data)
+    try:
+        catalog = requests.post(
+            f"{GAMMA_API_URL}/predict", json={"picks": picks, "stations": data.stations, "config": data.config}
+        )
+        print(catalog.json()["catalog"])
+        return catalog.json()
+    except Exception as error:
+        print(error)
+    if use_kafka:
+        print("Push picks to kafka...")
+        for pick in picks:
+            producer.send("phasenet_picks", key=pick["id"], value=pick)
+        print("Push waveform to kafka...")
+        for id, timestamp, vec in zip(data.id, data.timestamp, data.vec):
+            producer.send("waveform_phasenet", key=id, value={"timestamp": timestamp, "vec": vec, "dt": data.dt})
+    return {}
+@app.post("/predict_stream_phasenet2gamma")
+def predict(data: Data):
+    data = format_data(data)
+    # for i in range(len(data.id)):
+    #     plt.clf()
+    #     plt.subplot(311)
+    #     plt.plot(np.array(data.vec)[i, :, 0])
+    #     plt.subplot(312)
+    #     plt.plot(np.array(data.vec)[i, :, 1])
+    #     plt.subplot(313)
+    #     plt.plot(np.array(data.vec)[i, :, 2])
+    #     plt.savefig(f"{data.id[i]}.png")
+    picks = get_prediction(data)
+    return_value = {}
+    try:
+        catalog = requests.post(f"{GAMMA_API_URL}/predict_stream", json={"picks": picks})
+        print("GMMA:", catalog.json()["catalog"])
+        return_value = catalog.json()
+    except Exception as error:
+        print(error)
+    if use_kafka:
+        print("Push picks to kafka...")
+        for pick in picks:
+            producer.send("phasenet_picks", key=pick["id"], value=pick)
+        print("Push waveform to kafka...")
+        for id, timestamp, vec in zip(data.id, data.timestamp, data.vec):
+            producer.send("waveform_phasenet", key=id, value={"timestamp": timestamp, "vec": vec, "dt": data.dt})
+    return return_value
+@app.get("/healthz")
+def healthz():
+    return {"status": "ok"}

phasenet/data_reader.py ADDED Viewed

	@@ -0,0 +1,1010 @@

+import tensorflow as tf
+tf.compat.v1.disable_eager_execution()
+tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
+import logging
+import os
+import numpy as np
+import pandas as pd
+pd.options.mode.chained_assignment = None
+import json
+import random
+from collections import defaultdict
+# import s3fs
+import h5py
+import obspy
+from scipy.interpolate import interp1d
+from tqdm import tqdm
+def py_func_decorator(output_types=None, output_shapes=None, name=None):
+    def decorator(func):
+        def call(*args, **kwargs):
+            nonlocal output_shapes
+            # flat_output_types = nest.flatten(output_types)
+            flat_output_types = tf.nest.flatten(output_types)
+            # flat_values = tf.py_func(
+            flat_values = tf.numpy_function(func, inp=args, Tout=flat_output_types, name=name)
+            if output_shapes is not None:
+                for v, s in zip(flat_values, output_shapes):
+                    v.set_shape(s)
+            # return nest.pack_sequence_as(output_types, flat_values)
+            return tf.nest.pack_sequence_as(output_types, flat_values)
+        return call
+    return decorator
+def dataset_map(iterator, output_types, output_shapes=None, num_parallel_calls=None, name=None, shuffle=False):
+    dataset = tf.data.Dataset.range(len(iterator))
+    if shuffle:
+        dataset = dataset.shuffle(len(iterator), reshuffle_each_iteration=True)
+    @py_func_decorator(output_types, output_shapes, name=name)
+    def index_to_entry(idx):
+        return iterator[idx]
+    return dataset.map(index_to_entry, num_parallel_calls=num_parallel_calls)
+def normalize(data, axis=(0,)):
+    """data shape: (nt, nsta, nch)"""
+    data -= np.mean(data, axis=axis, keepdims=True)
+    std_data = np.std(data, axis=axis, keepdims=True)
+    std_data[std_data == 0] = 1
+    data /= std_data
+    # data /= (std_data + 1e-12)
+    return data
+def normalize_long(data, axis=(0,), window=3000):
+    """
+    data: nt, nch
+    """
+    nt, nar, nch = data.shape
+    if window is None:
+        window = nt
+    shift = window // 2
+    dtype = data.dtype
+    ## std in slide windows
+    data_pad = np.pad(data, ((window // 2, window // 2), (0, 0), (0, 0)), mode="reflect")
+    t = np.arange(0, nt, shift, dtype="int")
+    std = np.zeros([len(t) + 1, nar, nch])
+    mean = np.zeros([len(t) + 1, nar, nch])
+    for i in range(1, len(std)):
+        std[i, :] = np.std(data_pad[i * shift : i * shift + window, :, :], axis=axis)
+        mean[i, :] = np.mean(data_pad[i * shift : i * shift + window, :, :], axis=axis)
+    t = np.append(t, nt)
+    # std[-1, :] = np.std(data_pad[-window:, :], axis=0)
+    # mean[-1, :] = np.mean(data_pad[-window:, :], axis=0)
+    std[-1, ...], mean[-1, ...] = std[-2, ...], mean[-2, ...]
+    std[0, ...], mean[0, ...] = std[1, ...], mean[1, ...]
+    # std[std == 0] = 1.0
+    ## normalize data with interplated std
+    t_interp = np.arange(nt, dtype="int")
+    std_interp = interp1d(t, std, axis=0, kind="slinear")(t_interp)
+    # std_interp = np.exp(interp1d(t, np.log(std), axis=0, kind="slinear")(t_interp))
+    mean_interp = interp1d(t, mean, axis=0, kind="slinear")(t_interp)
+    tmp = np.sum(std_interp, axis=(0, 1))
+    std_interp[std_interp == 0] = 1.0
+    data = (data - mean_interp) / std_interp
+    # data = (data - mean_interp)/(std_interp + 1e-12)
+    ### dropout effect of < 3 channel
+    nonzero = np.count_nonzero(tmp)
+    if (nonzero < 3) and (nonzero > 0):
+        data *= 3.0 / nonzero
+    return data.astype(dtype)
+def normalize_batch(data, window=3000):
+    """
+    data: nsta, nt, nch
+    """
+    nsta, nt, nar, nch = data.shape
+    if window is None:
+        window = nt
+    shift = window // 2
+    ## std in slide windows
+    data_pad = np.pad(data, ((0, 0), (window // 2, window // 2), (0, 0), (0, 0)), mode="reflect")
+    t = np.arange(0, nt, shift, dtype="int")
+    std = np.zeros([nsta, len(t) + 1, nar, nch])
+    mean = np.zeros([nsta, len(t) + 1, nar, nch])
+    for i in range(1, len(t)):
+        std[:, i, :, :] = np.std(data_pad[:, i * shift : i * shift + window, :, :], axis=1)
+        mean[:, i, :, :] = np.mean(data_pad[:, i * shift : i * shift + window, :, :], axis=1)
+    t = np.append(t, nt)
+    # std[:, -1, :] = np.std(data_pad[:, -window:, :], axis=1)
+    # mean[:, -1, :] = np.mean(data_pad[:, -window:, :], axis=1)
+    std[:, -1, :, :], mean[:, -1, :, :] = std[:, -2, :, :], mean[:, -2, :, :]
+    std[:, 0, :, :], mean[:, 0, :, :] = std[:, 1, :, :], mean[:, 1, :, :]
+    # std[std == 0] = 1
+    # ## normalize data with interplated std
+    t_interp = np.arange(nt, dtype="int")
+    std_interp = interp1d(t, std, axis=1, kind="slinear")(t_interp)
+    # std_interp = np.exp(interp1d(t, np.log(std), axis=1, kind="slinear")(t_interp))
+    mean_interp = interp1d(t, mean, axis=1, kind="slinear")(t_interp)
+    tmp = np.sum(std_interp, axis=(1, 2))
+    std_interp[std_interp == 0] = 1.0
+    data = (data - mean_interp) / std_interp
+    # data = (data - mean_interp)/(std_interp + 1e-12)
+    ### dropout effect of < 3 channel
+    nonzero = np.count_nonzero(tmp, axis=-1)
+    data[nonzero > 0, ...] *= 3.0 / nonzero[nonzero > 0][:, np.newaxis, np.newaxis, np.newaxis]
+    return data
+class DataConfig:
+    seed = 123
+    use_seed = True
+    n_channel = 3
+    n_class = 3
+    sampling_rate = 100
+    dt = 1.0 / sampling_rate
+    X_shape = [3000, 1, n_channel]
+    Y_shape = [3000, 1, n_class]
+    min_event_gap = 3 * sampling_rate
+    label_shape = "gaussian"
+    label_width = 30
+    dtype = "float32"
+    def __init__(self, **kwargs):
+        for k, v in kwargs.items():
+            setattr(self, k, v)
+class DataReader:
+    def __init__(
+        self, format="numpy", config=DataConfig(), response_xml=None, sampling_rate=100, highpass_filter=0, **kwargs
+    ):
+        self.buffer = {}
+        self.n_channel = config.n_channel
+        self.n_class = config.n_class
+        self.X_shape = config.X_shape
+        self.Y_shape = config.Y_shape
+        self.dt = config.dt
+        self.dtype = config.dtype
+        self.label_shape = config.label_shape
+        self.label_width = config.label_width
+        self.config = config
+        self.format = format
+        # if "highpass_filter" in kwargs:
+        #     self.highpass_filter = kwargs["highpass_filter"]
+        self.highpass_filter = highpass_filter
+        # self.response_xml = response_xml
+        if response_xml is not None:
+            self.response = obspy.read_inventory(response_xml)
+        else:
+            self.response = None
+        self.sampling_rate = sampling_rate
+        if format in ["numpy", "mseed", "sac"]:
+            self.data_dir = kwargs["data_dir"]
+            try:
+                csv = pd.read_csv(kwargs["data_list"], header=0, sep="[,|\s+]", engine="python")
+            except:
+                csv = pd.read_csv(kwargs["data_list"], header=0, sep="\t")
+            self.data_list = csv["fname"]
+            self.num_data = len(self.data_list)
+        elif format == "hdf5":
+            self.h5 = h5py.File(kwargs["hdf5_file"], "r", libver="latest", swmr=True)
+            self.h5_data = self.h5[kwargs["hdf5_group"]]
+            self.data_list = list(self.h5_data.keys())
+            self.num_data = len(self.data_list)
+        elif format == "s3":
+            self.s3fs = s3fs.S3FileSystem(
+                anon=kwargs["anon"],
+                key=kwargs["key"],
+                secret=kwargs["secret"],
+                client_kwargs={"endpoint_url": kwargs["s3_url"]},
+                use_ssl=kwargs["use_ssl"],
+            )
+            self.num_data = 0
+        else:
+            raise (f"{format} not support!")
+    def __len__(self):
+        return self.num_data
+    def read_numpy(self, fname):
+        # try:
+        if fname not in self.buffer:
+            npz = np.load(fname)
+            meta = {}
+            if len(npz["data"].shape) == 2:
+                meta["data"] = npz["data"][:, np.newaxis, :]
+            else:
+                meta["data"] = npz["data"]
+            if "p_idx" in npz.files:
+                if len(npz["p_idx"].shape) == 0:
+                    meta["itp"] = [[npz["p_idx"]]]
+                else:
+                    meta["itp"] = npz["p_idx"]
+            if "s_idx" in npz.files:
+                if len(npz["s_idx"].shape) == 0:
+                    meta["its"] = [[npz["s_idx"]]]
+                else:
+                    meta["its"] = npz["s_idx"]
+            if "itp" in npz.files:
+                if len(npz["itp"].shape) == 0:
+                    meta["itp"] = [[npz["itp"]]]
+                else:
+                    meta["itp"] = npz["itp"]
+            if "its" in npz.files:
+                if len(npz["its"].shape) == 0:
+                    meta["its"] = [[npz["its"]]]
+                else:
+                    meta["its"] = npz["its"]
+            if "station_id" in npz.files:
+                meta["station_id"] = npz["station_id"]
+            if "sta_id" in npz.files:
+                meta["station_id"] = npz["sta_id"]
+            if "t0" in npz.files:
+                meta["t0"] = npz["t0"]
+            self.buffer[fname] = meta
+        else:
+            meta = self.buffer[fname]
+        return meta
+        # except:
+        #     logging.error("Failed reading {}".format(fname))
+        #     return None
+    def read_hdf5(self, fname):
+        data = self.h5_data[fname][()]
+        attrs = self.h5_data[fname].attrs
+        meta = {}
+        if len(data.shape) == 2:
+            meta["data"] = data[:, np.newaxis, :]
+        else:
+            meta["data"] = data
+        if "p_idx" in attrs:
+            if len(attrs["p_idx"].shape) == 0:
+                meta["itp"] = [[attrs["p_idx"]]]
+            else:
+                meta["itp"] = attrs["p_idx"]
+        if "s_idx" in attrs:
+            if len(attrs["s_idx"].shape) == 0:
+                meta["its"] = [[attrs["s_idx"]]]
+            else:
+                meta["its"] = attrs["s_idx"]
+        if "itp" in attrs:
+            if len(attrs["itp"].shape) == 0:
+                meta["itp"] = [[attrs["itp"]]]
+            else:
+                meta["itp"] = attrs["itp"]
+        if "its" in attrs:
+            if len(attrs["its"].shape) == 0:
+                meta["its"] = [[attrs["its"]]]
+            else:
+                meta["its"] = attrs["its"]
+        if "t0" in attrs:
+            meta["t0"] = attrs["t0"]
+        return meta
+    def read_s3(self, format, fname, bucket, key, secret, s3_url, use_ssl):
+        with self.s3fs.open(bucket + "/" + fname, "rb") as fp:
+            if format == "numpy":
+                meta = self.read_numpy(fp)
+            elif format == "mseed":
+                meta = self.read_mseed(fp)
+            else:
+                raise (f"Format {format} not supported")
+        return meta
+    def read_mseed(self, fname, response=None, highpass_filter=0.0, sampling_rate=100, return_single_station=True):
+        try:
+            stream = obspy.read(fname)
+            stream = stream.merge(fill_value="latest")
+            if response is not None:
+                # response = obspy.read_inventory(response_xml)
+                stream = stream.remove_sensitivity(response)
+        except Exception as e:
+            print(f"Error reading {fname}:\n{e}")
+            return {}
+        tmp_stream = obspy.Stream()
+        for trace in stream:
+            if len(trace.data) < 10:
+                continue
+            ## interpolate to 100 Hz
+            if abs(trace.stats.sampling_rate - sampling_rate) > 0.1:
+                logging.warning(f"Resampling {trace.id} from {trace.stats.sampling_rate} to {sampling_rate} Hz")
+                try:
+                    trace = trace.interpolate(sampling_rate, method="linear")
+                except Exception as e:
+                    print(f"Error resampling {trace.id}:\n{e}")
+            trace = trace.detrend("demean")
+            ## highpass filtering > 1Hz
+            if highpass_filter > 0.0:
+                trace = trace.filter("highpass", freq=highpass_filter)
+            tmp_stream.append(trace)
+        if len(tmp_stream) == 0:
+            return {}
+        stream = tmp_stream
+        begin_time = min([st.stats.starttime for st in stream])
+        end_time = max([st.stats.endtime for st in stream])
+        stream = stream.trim(begin_time, end_time, pad=True, fill_value=0)
+        comp = ["3", "2", "1", "E", "N", "U", "V", "Z"]
+        order = {key: i for i, key in enumerate(comp)}
+        comp2idx = {
+            "3": 0,
+            "2": 1,
+            "1": 2,
+            "E": 0,
+            "N": 1,
+            "Z": 2,
+            "U": 0,
+            "V": 1,
+        }  ## only for cases less than 3 components
+        station_ids = defaultdict(list)
+        for tr in stream:
+            station_ids[tr.id[:-1]].append(tr.id[-1])
+            if tr.id[-1] not in comp:
+                print(f"Unknown component {tr.id[-1]}")
+        station_keys = sorted(list(station_ids.keys()))
+        nx = len(station_ids)
+        nt = len(stream[0].data)
+        data = np.zeros([3, nt, nx], dtype=np.float32)
+        for i, sta in enumerate(station_keys):
+            for j, c in enumerate(sorted(station_ids[sta], key=lambda x: order[x])):
+                if len(station_ids[sta]) != 3:  ## less than 3 component
+                    j = comp2idx[c]
+                if len(stream.select(id=sta + c)) == 0:
+                    print(f"Empty trace: {sta+c} {begin_time}")
+                    continue
+                trace = stream.select(id=sta + c)[0]
+                ## accerleration to velocity
+                if sta[-1] == "N":
+                    trace = trace.integrate().filter("highpass", freq=1.0)
+                tmp = trace.data.astype("float32")
+                data[j, : len(tmp), i] = tmp[:nt]
+        # if return_single_station and (len(station_keys) > 1):
+        #     print(f"Warning: {fname} has multiple stations, returning only the first one {station_keys[0]}")
+        #     data = data[:, :, 0:1]
+        #     station_keys = station_keys[0:1]
+        meta = {
+            "data": data.transpose([1, 2, 0]),
+            "t0": begin_time.datetime.isoformat(timespec="milliseconds"),
+            "station_id": station_keys,
+        }
+        return meta
+    def read_sac(self, fname):
+        mseed = obspy.read(fname)
+        mseed = mseed.detrend("spline", order=2, dspline=5 * mseed[0].stats.sampling_rate)
+        mseed = mseed.merge(fill_value=0)
+        if self.highpass_filter > 0:
+            mseed = mseed.filter("highpass", freq=self.highpass_filter)
+        starttime = min([st.stats.starttime for st in mseed])
+        endtime = max([st.stats.endtime for st in mseed])
+        mseed = mseed.trim(starttime, endtime, pad=True, fill_value=0)
+        if abs(mseed[0].stats.sampling_rate - self.config.sampling_rate) > 1:
+            logging.warning(
+                f"Sampling rate mismatch in {fname.split('/')[-1]}: {mseed[0].stats.sampling_rate}Hz != {self.config.sampling_rate}Hz "
+            )
+        order = ["3", "2", "1", "E", "N", "Z"]
+        order = {key: i for i, key in enumerate(order)}
+        comp2idx = {"3": 0, "2": 1, "1": 2, "E": 0, "N": 1, "Z": 2}
+        t0 = starttime.strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3]
+        nt = len(mseed[0].data)
+        data = np.zeros([nt, self.config.n_channel], dtype=self.dtype)
+        ids = [x.get_id() for x in mseed]
+        for j, id in enumerate(sorted(ids, key=lambda x: order[x[-1]])):
+            if len(ids) != 3:
+                if len(ids) > 3:
+                    logging.warning(f"More than 3 channels {ids}!")
+                j = comp2idx[id[-1]]
+            data[:, j] = mseed.select(id=id)[0].data.astype(self.dtype)
+        data = data[:, np.newaxis, :]
+        meta = {"data": data, "t0": t0}
+        return meta
+    def read_mseed_array(self, fname, stations, amplitude=False, remove_resp=True):
+        data = []
+        station_id = []
+        t0 = []
+        raw_amp = []
+        try:
+            mseed = obspy.read(fname)
+            read_success = True
+        except Exception as e:
+            read_success = False
+            print(e)
+        if read_success:
+            try:
+                mseed = mseed.merge(fill_value=0)
+            except Exception as e:
+                print(e)
+            for i in range(len(mseed)):
+                if mseed[i].stats.sampling_rate != self.config.sampling_rate:
+                    logging.warning(
+                        f"Resampling {mseed[i].id} from {mseed[i].stats.sampling_rate} to {self.config.sampling_rate} Hz"
+                    )
+                    try:
+                        mseed[i] = mseed[i].interpolate(self.config.sampling_rate, method="linear")
+                    except Exception as e:
+                        print(e)
+                        mseed[i].data = mseed[i].data.astype(float) * 0.0  ## set to zero if resampling fails
+            if self.highpass_filter == 0:
+                try:
+                    mseed = mseed.detrend("spline", order=2, dspline=5 * mseed[0].stats.sampling_rate)
+                except:
+                    logging.error(f"Error: spline detrend failed at file {fname}")
+                    mseed = mseed.detrend("demean")
+            else:
+                mseed = mseed.filter("highpass", freq=self.highpass_filter)
+            starttime = min([st.stats.starttime for st in mseed])
+            endtime = max([st.stats.endtime for st in mseed])
+            mseed = mseed.trim(starttime, endtime, pad=True, fill_value=0)
+            order = ["3", "2", "1", "E", "N", "Z"]
+            order = {key: i for i, key in enumerate(order)}
+            comp2idx = {"3": 0, "2": 1, "1": 2, "E": 0, "N": 1, "Z": 2}
+            nsta = len(stations)
+            nt = len(mseed[0].data)
+            # for i in range(nsta):
+            for sta in stations:
+                trace_data = np.zeros([nt, self.config.n_channel], dtype=self.dtype)
+                if amplitude:
+                    trace_amp = np.zeros([nt, self.config.n_channel], dtype=self.dtype)
+                empty_station = True
+                # sta = stations.iloc[i]["station"]
+                # comp = stations.iloc[i]["component"].split(",")
+                comp = stations[sta]["component"]
+                if amplitude:
+                    # resp = stations.iloc[i]["response"].split(",")
+                    resp = stations[sta]["response"]
+                for j, c in enumerate(sorted(comp, key=lambda x: order[x[-1]])):
+                    resp_j = resp[j]
+                    if len(comp) != 3:  ## less than 3 component
+                        j = comp2idx[c]
+                    if len(mseed.select(id=sta + c)) == 0:
+                        print(f"Empty trace: {sta+c} {starttime}")
+                        continue
+                    else:
+                        empty_station = False
+                    tmp = mseed.select(id=sta + c)[0].data.astype(self.dtype)
+                    trace_data[: len(tmp), j] = tmp[:nt]
+                    if amplitude:
+                        # if stations.iloc[i]["unit"] == "m/s**2":
+                        if stations[sta]["unit"] == "m/s**2":
+                            tmp = mseed.select(id=sta + c)[0]
+                            tmp = tmp.integrate()
+                            tmp = tmp.filter("highpass", freq=1.0)
+                            tmp = tmp.data.astype(self.dtype)
+                            trace_amp[: len(tmp), j] = tmp[:nt]
+                        # elif stations.iloc[i]["unit"] == "m/s":
+                        elif stations[sta]["unit"] == "m/s":
+                            tmp = mseed.select(id=sta + c)[0].data.astype(self.dtype)
+                            trace_amp[: len(tmp), j] = tmp[:nt]
+                        else:
+                            print(
+                                f"Error in {stations.iloc[i]['station']}\n{stations.iloc[i]['unit']} should be m/s**2 or m/s!"
+                            )
+                    if amplitude and remove_resp:
+                        # trace_amp[:, j] /= float(resp[j])
+                        trace_amp[:, j] /= float(resp_j)
+                if not empty_station:
+                    data.append(trace_data)
+                    if amplitude:
+                        raw_amp.append(trace_amp)
+                    station_id.append([sta])
+                    t0.append(starttime.datetime.isoformat(timespec="milliseconds"))
+        if len(data) > 0:
+            data = np.stack(data)
+            if len(data.shape) == 3:
+                data = data[:, :, np.newaxis, :]
+            if amplitude:
+                raw_amp = np.stack(raw_amp)
+                if len(raw_amp.shape) == 3:
+                    raw_amp = raw_amp[:, :, np.newaxis, :]
+        else:
+            nt = 60 * 60 * self.config.sampling_rate  # assume 1 hour data
+            data = np.zeros([1, nt, 1, self.config.n_channel], dtype=self.dtype)
+            if amplitude:
+                raw_amp = np.zeros([1, nt, 1, self.config.n_channel], dtype=self.dtype)
+            t0 = ["1970-01-01T00:00:00.000"]
+            station_id = ["None"]
+        if amplitude:
+            meta = {"data": data, "t0": t0, "station_id": station_id, "fname": fname.split("/")[-1], "raw_amp": raw_amp}
+        else:
+            meta = {"data": data, "t0": t0, "station_id": station_id, "fname": fname.split("/")[-1]}
+        return meta
+    def generate_label(self, data, phase_list, mask=None):
+        # target = np.zeros(self.Y_shape, dtype=self.dtype)
+        target = np.zeros_like(data)
+        if self.label_shape == "gaussian":
+            label_window = np.exp(
+                -((np.arange(-self.label_width // 2, self.label_width // 2 + 1)) ** 2)
+                / (2 * (self.label_width / 5) ** 2)
+            )
+        elif self.label_shape == "triangle":
+            label_window = 1 - np.abs(
+                2 / self.label_width * (np.arange(-self.label_width // 2, self.label_width // 2 + 1))
+            )
+        else:
+            print(f"Label shape {self.label_shape} should be guassian or triangle")
+            raise
+        for i, phases in enumerate(phase_list):
+            for j, idx_list in enumerate(phases):
+                for idx in idx_list:
+                    if np.isnan(idx):
+                        continue
+                    idx = int(idx)
+                    if (idx - self.label_width // 2 >= 0) and (idx + self.label_width // 2 + 1 <= target.shape[0]):
+                        target[idx - self.label_width // 2 : idx + self.label_width // 2 + 1, j, i + 1] = label_window
+            target[..., 0] = 1 - np.sum(target[..., 1:], axis=-1)
+            if mask is not None:
+                target[:, mask == 0, :] = 0
+        return target
+    def random_shift(self, sample, itp, its, itp_old=None, its_old=None, shift_range=None):
+        # anchor = np.round(1/2 * (min(itp[~np.isnan(itp.astype(float))]) + min(its[~np.isnan(its.astype(float))]))).astype(int)
+        flattern = lambda x: np.array([i for trace in x for i in trace], dtype=float)
+        shift_pick = lambda x, shift: [[i - shift for i in trace] for trace in x]
+        itp_flat = flattern(itp)
+        its_flat = flattern(its)
+        if (itp_old is None) and (its_old is None):
+            hi = np.round(np.median(itp_flat[~np.isnan(itp_flat)])).astype(int)
+            lo = -(sample.shape[0] - np.round(np.median(its_flat[~np.isnan(its_flat)])).astype(int))
+            if shift_range is None:
+                shift = np.random.randint(low=lo, high=hi + 1)
+            else:
+                shift = np.random.randint(low=max(lo, shift_range[0]), high=min(hi + 1, shift_range[1]))
+        else:
+            itp_old_flat = flattern(itp_old)
+            its_old_flat = flattern(its_old)
+            itp_ref = np.round(np.min(itp_flat[~np.isnan(itp_flat)])).astype(int)
+            its_ref = np.round(np.max(its_flat[~np.isnan(its_flat)])).astype(int)
+            itp_old_ref = np.round(np.min(itp_old_flat[~np.isnan(itp_old_flat)])).astype(int)
+            its_old_ref = np.round(np.max(its_old_flat[~np.isnan(its_old_flat)])).astype(int)
+            # min_event_gap = np.round(self.min_event_gap*(its_ref-itp_ref)).astype(int)
+            # min_event_gap_old = np.round(self.min_event_gap*(its_old_ref-itp_old_ref)).astype(int)
+            if shift_range is None:
+                hi = list(range(max(its_ref - itp_old_ref + self.min_event_gap, 0), itp_ref))
+                lo = list(range(-(sample.shape[0] - its_ref), -(max(its_old_ref - itp_ref + self.min_event_gap, 0))))
+            else:
+                lo_ = max(-(sample.shape[0] - its_ref), shift_range[0])
+                hi_ = min(itp_ref, shift_range[1])
+                hi = list(range(max(its_ref - itp_old_ref + self.min_event_gap, 0), hi_))
+                lo = list(range(lo_, -(max(its_old_ref - itp_ref + self.min_event_gap, 0))))
+            if len(hi + lo) > 0:
+                shift = np.random.choice(hi + lo)
+            else:
+                shift = 0
+        shifted_sample = np.zeros_like(sample)
+        if shift > 0:
+            shifted_sample[:-shift, ...] = sample[shift:, ...]
+        elif shift < 0:
+            shifted_sample[-shift:, ...] = sample[:shift, ...]
+        else:
+            shifted_sample[...] = sample[...]
+        return shifted_sample, shift_pick(itp, shift), shift_pick(its, shift), shift
+    def stack_events(self, sample_old, itp_old, its_old, shift_range=None, mask_old=None):
+        i = np.random.randint(self.num_data)
+        base_name = self.data_list[i]
+        if self.format == "numpy":
+            meta = self.read_numpy(os.path.join(self.data_dir, base_name))
+        elif self.format == "hdf5":
+            meta = self.read_hdf5(base_name)
+        if meta == -1:
+            return sample_old, itp_old, its_old
+        sample = np.copy(meta["data"])
+        itp = meta["itp"]
+        its = meta["its"]
+        if mask_old is not None:
+            mask = np.copy(meta["mask"])
+        sample = normalize(sample)
+        sample, itp, its, shift = self.random_shift(sample, itp, its, itp_old, its_old, shift_range)
+        if shift != 0:
+            sample_old += sample
+            # itp_old = [np.hstack([i, j]) for i,j in zip(itp_old, itp)]
+            # its_old = [np.hstack([i, j]) for i,j in zip(its_old, its)]
+            itp_old = [i + j for i, j in zip(itp_old, itp)]
+            its_old = [i + j for i, j in zip(its_old, its)]
+            if mask_old is not None:
+                mask_old = mask_old * mask
+        return sample_old, itp_old, its_old, mask_old
+    def cut_window(self, sample, target, itp, its, select_range):
+        shift_pick = lambda x, shift: [[i - shift for i in trace] for trace in x]
+        sample = sample[select_range[0] : select_range[1]]
+        target = target[select_range[0] : select_range[1]]
+        return (sample, target, shift_pick(itp, select_range[0]), shift_pick(its, select_range[0]))
+class DataReader_train(DataReader):
+    def __init__(self, format="numpy", config=DataConfig(), **kwargs):
+        super().__init__(format=format, config=config, **kwargs)
+        self.min_event_gap = config.min_event_gap
+        self.buffer_channels = {}
+        self.shift_range = [-2000 + self.label_width * 2, 1000 - self.label_width * 2]
+        self.select_range = [5000, 8000]
+    def __getitem__(self, i):
+        base_name = self.data_list[i]
+        if self.format == "numpy":
+            meta = self.read_numpy(os.path.join(self.data_dir, base_name))
+        elif self.format == "hdf5":
+            meta = self.read_hdf5(base_name)
+        if meta == None:
+            return (np.zeros(self.X_shape, dtype=self.dtype), np.zeros(self.Y_shape, dtype=self.dtype), base_name)
+        sample = np.copy(meta["data"])
+        itp_list = meta["itp"]
+        its_list = meta["its"]
+        sample = normalize(sample)
+        if np.random.random() < 0.95:
+            sample, itp_list, its_list, _ = self.random_shift(sample, itp_list, its_list, shift_range=self.shift_range)
+            sample, itp_list, its_list, _ = self.stack_events(sample, itp_list, its_list, shift_range=self.shift_range)
+            target = self.generate_label(sample, [itp_list, its_list])
+            sample, target, itp_list, its_list = self.cut_window(sample, target, itp_list, its_list, self.select_range)
+        else:
+            ## noise
+            assert self.X_shape[0] <= min(min(itp_list))
+            sample = sample[: self.X_shape[0], ...]
+            target = np.zeros(self.Y_shape).astype(self.dtype)
+            itp_list = [[]]
+            its_list = [[]]
+        sample = normalize(sample)
+        return (sample.astype(self.dtype), target.astype(self.dtype), base_name)
+    def dataset(self, batch_size, num_parallel_calls=2, shuffle=True, drop_remainder=True):
+        dataset = dataset_map(
+            self,
+            output_types=(self.dtype, self.dtype, "string"),
+            output_shapes=(self.X_shape, self.Y_shape, None),
+            num_parallel_calls=num_parallel_calls,
+            shuffle=shuffle,
+        )
+        dataset = dataset.batch(batch_size, drop_remainder=drop_remainder).prefetch(batch_size * 2)
+        return dataset
+class DataReader_test(DataReader):
+    def __init__(self, format="numpy", config=DataConfig(), **kwargs):
+        super().__init__(format=format, config=config, **kwargs)
+        self.select_range = [5000, 8000]
+    def __getitem__(self, i):
+        base_name = self.data_list[i]
+        if self.format == "numpy":
+            meta = self.read_numpy(os.path.join(self.data_dir, base_name))
+        elif self.format == "hdf5":
+            meta = self.read_hdf5(base_name)
+        if meta == -1:
+            return (np.zeros(self.Y_shape, dtype=self.dtype), np.zeros(self.X_shape, dtype=self.dtype), base_name)
+        sample = np.copy(meta["data"])
+        itp_list = meta["itp"]
+        its_list = meta["its"]
+        # sample, itp_list, its_list, _ = self.random_shift(sample, itp_list, its_list, shift_range=self.shift_range)
+        target = self.generate_label(sample, [itp_list, its_list])
+        sample, target, itp_list, its_list = self.cut_window(sample, target, itp_list, its_list, self.select_range)
+        sample = normalize(sample)
+        return (sample, target, base_name, itp_list, its_list)
+    def dataset(self, batch_size, num_parallel_calls=2, shuffle=False, drop_remainder=False):
+        dataset = dataset_map(
+            self,
+            output_types=(self.dtype, self.dtype, "string", "int64", "int64"),
+            output_shapes=(self.X_shape, self.Y_shape, None, None, None),
+            num_parallel_calls=num_parallel_calls,
+            shuffle=shuffle,
+        )
+        dataset = dataset.batch(batch_size, drop_remainder=drop_remainder).prefetch(batch_size * 2)
+        return dataset
+class DataReader_pred(DataReader):
+    def __init__(self, format="numpy", amplitude=True, config=DataConfig(), **kwargs):
+        super().__init__(format=format, config=config, **kwargs)
+        self.amplitude = amplitude
+    def adjust_missingchannels(self, data):
+        tmp = np.max(np.abs(data), axis=0, keepdims=True)
+        assert tmp.shape[-1] == data.shape[-1]
+        if np.count_nonzero(tmp) > 0:
+            data *= data.shape[-1] / np.count_nonzero(tmp)
+        return data
+    def __getitem__(self, i):
+        base_name = self.data_list[i]
+        if self.format == "numpy":
+            meta = self.read_numpy(os.path.join(self.data_dir, base_name))
+        elif (self.format == "mseed") or (self.format == "sac"):
+            meta = self.read_mseed(
+                os.path.join(self.data_dir, base_name),
+                response=self.response,
+                sampling_rate=self.sampling_rate,
+                highpass_filter=self.highpass_filter,
+                return_single_station=True,
+            )
+        elif self.format == "hdf5":
+            meta = self.read_hdf5(base_name)
+        else:
+            raise (f"{self.format} does not support!")
+        if "data" in meta:
+            raw_amp = meta["data"].copy()
+            sample = normalize_long(meta["data"])
+        else:
+            raw_amp = np.zeros([3000, 1, 3], dtype=np.float32)
+            sample = np.zeros([3000, 1, 3], dtype=np.float32)
+        if "t0" in meta:
+            t0 = meta["t0"]
+        else:
+            t0 = "1970-01-01T00:00:00.000"
+        if "station_id" in meta:
+            station_id = meta["station_id"]
+        else:
+            # station_id = base_name.split("/")[-1].rstrip("*")
+            station_id = os.path.basename(base_name).rstrip("*")
+        if np.isnan(sample).any() or np.isinf(sample).any():
+            logging.warning(f"Data error: Nan or Inf found in {base_name}")
+            sample[np.isnan(sample)] = 0
+            sample[np.isinf(sample)] = 0
+        # sample = self.adjust_missingchannels(sample)
+        if self.amplitude:
+            return (sample, raw_amp, base_name, t0, station_id)
+        else:
+            return (sample, base_name, t0, station_id)
+    def dataset(self, batch_size, num_parallel_calls=2, shuffle=False, drop_remainder=False):
+        if self.amplitude:
+            dataset = dataset_map(
+                self,
+                output_types=(self.dtype, self.dtype, "string", "string", "string"),
+                output_shapes=([None, None, 3], [None, None, 3], None, None, None),
+                num_parallel_calls=num_parallel_calls,
+                shuffle=shuffle,
+            )
+        else:
+            dataset = dataset_map(
+                self,
+                output_types=(self.dtype, "string", "string", "string"),
+                output_shapes=([None, None, 3], None, None, None),
+                num_parallel_calls=num_parallel_calls,
+                shuffle=shuffle,
+            )
+        dataset = dataset.batch(batch_size, drop_remainder=drop_remainder).prefetch(batch_size * 2)
+        return dataset
+class DataReader_mseed_array(DataReader):
+    def __init__(self, stations, amplitude=True, remove_resp=True, config=DataConfig(), **kwargs):
+        super().__init__(format="mseed", config=config, **kwargs)
+        # self.stations = pd.read_json(stations)
+        with open(stations, "r") as f:
+            self.stations = json.load(f)
+        print(pd.DataFrame.from_dict(self.stations, orient="index").to_string())
+        self.amplitude = amplitude
+        self.remove_resp = remove_resp
+        self.X_shape = self.get_data_shape()
+    def get_data_shape(self):
+        fname = os.path.join(self.data_dir, self.data_list[0])
+        meta = self.read_mseed_array(fname, self.stations, self.amplitude, self.remove_resp)
+        return meta["data"].shape
+    def __getitem__(self, i):
+        fp = os.path.join(self.data_dir, self.data_list[i])
+        # try:
+        meta = self.read_mseed_array(fp, self.stations, self.amplitude, self.remove_resp)
+        # except Exception as e:
+        #     logging.error(f"Failed reading {fp}: {e}")
+        #     if self.amplitude:
+        #         return (np.zeros(self.X_shape).astype(self.dtype), np.zeros(self.X_shape).astype(self.dtype),
+        #             [self.stations.iloc[i]["station"] for i in range(len(self.stations))], ["0" for i in range(len(self.stations))])
+        #     else:
+        #         return (np.zeros(self.X_shape).astype(self.dtype), ["" for i in range(len(self.stations))],
+        #             [self.stations.iloc[i]["station"] for i in range(len(self.stations))])
+        sample = np.zeros([len(meta["data"]), *self.X_shape[1:]], dtype=self.dtype)
+        sample[:, : meta["data"].shape[1], :, :] = normalize_batch(meta["data"])[:, : self.X_shape[1], :, :]
+        if np.isnan(sample).any() or np.isinf(sample).any():
+            logging.warning(f"Data error: Nan or Inf found in {fp}")
+            sample[np.isnan(sample)] = 0
+            sample[np.isinf(sample)] = 0
+        t0 = meta["t0"]
+        base_name = meta["fname"]
+        station_id = meta["station_id"]
+        #         base_name = [self.stations.iloc[i]["station"]+"."+t0[i] for i in range(len(self.stations))]
+        # base_name = [self.stations.iloc[i]["station"] for i in range(len(self.stations))]
+        if self.amplitude:
+            raw_amp = np.zeros([len(meta["raw_amp"]), *self.X_shape[1:]], dtype=self.dtype)
+            raw_amp[:, : meta["raw_amp"].shape[1], :, :] = meta["raw_amp"][:, : self.X_shape[1], :, :]
+            if np.isnan(raw_amp).any() or np.isinf(raw_amp).any():
+                logging.warning(f"Data error: Nan or Inf found in {fp}")
+                raw_amp[np.isnan(raw_amp)] = 0
+                raw_amp[np.isinf(raw_amp)] = 0
+            return (sample, raw_amp, base_name, t0, station_id)
+        else:
+            return (sample, base_name, t0, station_id)
+    def dataset(self, num_parallel_calls=1, shuffle=False):
+        if self.amplitude:
+            dataset = dataset_map(
+                self,
+                output_types=(self.dtype, self.dtype, "string", "string", "string"),
+                output_shapes=([None, *self.X_shape[1:]], [None, *self.X_shape[1:]], None, None, None),
+                num_parallel_calls=num_parallel_calls,
+            )
+        else:
+            dataset = dataset_map(
+                self,
+                output_types=(self.dtype, "string", "string", "string"),
+                output_shapes=([None, *self.X_shape[1:]], None, None, None),
+                num_parallel_calls=num_parallel_calls,
+            )
+        dataset = dataset.prefetch(1)
+        #         dataset = dataset.prefetch(len(self.stations)*2)
+        return dataset
+###### test ########
+def test_DataReader():
+    import os
+    import timeit
+    import matplotlib.pyplot as plt
+    if not os.path.exists("test_figures"):
+        os.mkdir("test_figures")
+    def plot_sample(sample, fname, label=None):
+        plt.clf()
+        plt.subplot(211)
+        plt.plot(sample[:, 0, -1])
+        if label is not None:
+            plt.subplot(212)
+            plt.plot(label[:, 0, 0])
+            plt.plot(label[:, 0, 1])
+            plt.plot(label[:, 0, 2])
+        plt.savefig(f"test_figures/{fname.decode()}.png")
+    def read(data_reader, batch=1):
+        start_time = timeit.default_timer()
+        if batch is None:
+            dataset = data_reader.dataset(shuffle=False)
+        else:
+            dataset = data_reader.dataset(1, shuffle=False)
+        sess = tf.compat.v1.Session()
+        print(len(data_reader))
+        print("-------", tf.data.Dataset.cardinality(dataset))
+        num = 0
+        x = tf.compat.v1.data.make_one_shot_iterator(dataset).get_next()
+        while True:
+            num += 1
+            # print(num)
+            try:
+                out = sess.run(x)
+                if len(out) == 2:
+                    sample, fname = out[0], out[1]
+                    for i in range(len(sample)):
+                        plot_sample(sample[i], fname[i])
+                else:
+                    sample, label, fname = out[0], out[1], out[2]
+                    for i in range(len(sample)):
+                        plot_sample(sample[i], fname[i], label[i])
+            except tf.errors.OutOfRangeError:
+                break
+                print("End of dataset")
+        print("Tensorflow Dataset:\nexecution time = ", timeit.default_timer() - start_time)
+    data_reader = DataReader_train(data_list="test_data/selected_phases.csv", data_dir="test_data/data/")
+    read(data_reader)
+    data_reader = DataReader_train(format="hdf5", hdf5="test_data/data.h5", group="data")
+    read(data_reader)
+    data_reader = DataReader_test(data_list="test_data/selected_phases.csv", data_dir="test_data/data/")
+    read(data_reader)
+    data_reader = DataReader_test(format="hdf5", hdf5="test_data/data.h5", group="data")
+    read(data_reader)
+    data_reader = DataReader_pred(format="numpy", data_list="test_data/selected_phases.csv", data_dir="test_data/data/")
+    read(data_reader)
+    data_reader = DataReader_pred(
+        format="mseed", data_list="test_data/mseed_station.csv", data_dir="test_data/waveforms/"
+    )
+    read(data_reader)
+    data_reader = DataReader_pred(
+        format="mseed", amplitude=True, data_list="test_data/mseed_station.csv", data_dir="test_data/waveforms/"
+    )
+    read(data_reader)
+    data_reader = DataReader_mseed_array(
+        data_list="test_data/mseed.csv",
+        data_dir="test_data/waveforms/",
+        stations="test_data/stations.csv",
+        remove_resp=False,
+    )
+    read(data_reader, batch=None)
+if __name__ == "__main__":
+    test_DataReader()

phasenet/detect_peaks.py ADDED Viewed

	@@ -0,0 +1,207 @@

+"""Detect peaks in data based on their amplitude and other features."""
+from __future__ import division, print_function
+import warnings
+import numpy as np
+__author__ = "Marcos Duarte, https://github.com/demotu"
+__version__ = "1.0.6"
+__license__ = "MIT"
+def detect_peaks(x, mph=None, mpd=1, threshold=0, edge='rising',
+                 kpsh=False, valley=False, show=False, ax=None, title=True):
+    """Detect peaks in data based on their amplitude and other features.
+    Parameters
+    ----------
+    x : 1D array_like
+        data.
+    mph : {None, number}, optional (default = None)
+        detect peaks that are greater than minimum peak height (if parameter
+        `valley` is False) or peaks that are smaller than maximum peak height
+         (if parameter `valley` is True).
+    mpd : positive integer, optional (default = 1)
+        detect peaks that are at least separated by minimum peak distance (in
+        number of data).
+    threshold : positive number, optional (default = 0)
+        detect peaks (valleys) that are greater (smaller) than `threshold`
+        in relation to their immediate neighbors.
+    edge : {None, 'rising', 'falling', 'both'}, optional (default = 'rising')
+        for a flat peak, keep only the rising edge ('rising'), only the
+        falling edge ('falling'), both edges ('both'), or don't detect a
+        flat peak (None).
+    kpsh : bool, optional (default = False)
+        keep peaks with same height even if they are closer than `mpd`.
+    valley : bool, optional (default = False)
+        if True (1), detect valleys (local minima) instead of peaks.
+    show : bool, optional (default = False)
+        if True (1), plot data in matplotlib figure.
+    ax : a matplotlib.axes.Axes instance, optional (default = None).
+    title : bool or string, optional (default = True)
+        if True, show standard title. If False or empty string, doesn't show
+        any title. If string, shows string as title.
+    Returns
+    -------
+    ind : 1D array_like
+        indeces of the peaks in `x`.
+    Notes
+    -----
+    The detection of valleys instead of peaks is performed internally by simply
+    negating the data: `ind_valleys = detect_peaks(-x)`
+    The function can handle NaN's
+    See this IPython Notebook [1]_.
+    References
+    ----------
+    .. [1] http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/DetectPeaks.ipynb
+    Examples
+    --------
+    >>> from detect_peaks import detect_peaks
+    >>> x = np.random.randn(100)
+    >>> x[60:81] = np.nan
+    >>> # detect all peaks and plot data
+    >>> ind = detect_peaks(x, show=True)
+    >>> print(ind)
+    >>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5
+    >>> # set minimum peak height = 0 and minimum peak distance = 20
+    >>> detect_peaks(x, mph=0, mpd=20, show=True)
+    >>> x = [0, 1, 0, 2, 0, 3, 0, 2, 0, 1, 0]
+    >>> # set minimum peak distance = 2
+    >>> detect_peaks(x, mpd=2, show=True)
+    >>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5
+    >>> # detection of valleys instead of peaks
+    >>> detect_peaks(x, mph=-1.2, mpd=20, valley=True, show=True)
+    >>> x = [0, 1, 1, 0, 1, 1, 0]
+    >>> # detect both edges
+    >>> detect_peaks(x, edge='both', show=True)
+    >>> x = [-2, 1, -2, 2, 1, 1, 3, 0]
+    >>> # set threshold = 2
+    >>> detect_peaks(x, threshold = 2, show=True)
+    >>> x = [-2, 1, -2, 2, 1, 1, 3, 0]
+    >>> fig, axs = plt.subplots(ncols=2, nrows=1, figsize=(10, 4))
+    >>> detect_peaks(x, show=True, ax=axs[0], threshold=0.5, title=False)
+    >>> detect_peaks(x, show=True, ax=axs[1], threshold=1.5, title=False)
+    Version history
+    ---------------
+    '1.0.6':
+        Fix issue of when specifying ax object only the first plot was shown
+        Add parameter to choose if a title is shown and input a title
+    '1.0.5':
+        The sign of `mph` is inverted if parameter `valley` is True
+    """
+    x = np.atleast_1d(x).astype('float64')
+    if x.size < 3:
+        return np.array([], dtype=int)
+    if valley:
+        x = -x
+        if mph is not None:
+            mph = -mph
+    # find indices of all peaks
+    dx = x[1:] - x[:-1]
+    # handle NaN's
+    indnan = np.where(np.isnan(x))[0]
+    if indnan.size:
+        x[indnan] = np.inf
+        dx[np.where(np.isnan(dx))[0]] = np.inf
+    ine, ire, ife = np.array([[], [], []], dtype=int)
+    if not edge:
+        ine = np.where((np.hstack((dx, 0)) < 0) & (np.hstack((0, dx)) > 0))[0]
+    else:
+        if edge.lower() in ['rising', 'both']:
+            ire = np.where((np.hstack((dx, 0)) <= 0) & (np.hstack((0, dx)) > 0))[0]
+        if edge.lower() in ['falling', 'both']:
+            ife = np.where((np.hstack((dx, 0)) < 0) & (np.hstack((0, dx)) >= 0))[0]
+    ind = np.unique(np.hstack((ine, ire, ife)))
+    # handle NaN's
+    if ind.size and indnan.size:
+        # NaN's and values close to NaN's cannot be peaks
+        ind = ind[np.in1d(ind, np.unique(np.hstack((indnan, indnan-1, indnan+1))), invert=True)]
+    # first and last values of x cannot be peaks
+    if ind.size and ind[0] == 0:
+        ind = ind[1:]
+    if ind.size and ind[-1] == x.size-1:
+        ind = ind[:-1]
+    # remove peaks < minimum peak height
+    if ind.size and mph is not None:
+        ind = ind[x[ind] >= mph]
+    # remove peaks - neighbors < threshold
+    if ind.size and threshold > 0:
+        dx = np.min(np.vstack([x[ind]-x[ind-1], x[ind]-x[ind+1]]), axis=0)
+        ind = np.delete(ind, np.where(dx < threshold)[0])
+    # detect small peaks closer than minimum peak distance
+    if ind.size and mpd > 1:
+        ind = ind[np.argsort(x[ind])][::-1]  # sort ind by peak height
+        idel = np.zeros(ind.size, dtype=bool)
+        for i in range(ind.size):
+            if not idel[i]:
+                # keep peaks with the same height if kpsh is True
+                idel = idel | (ind >= ind[i] - mpd) & (ind <= ind[i] + mpd) \
+                       & (x[ind[i]] > x[ind] if kpsh else True)
+                idel[i] = 0  # Keep current peak
+        # remove the small peaks and sort back the indices by their occurrence
+        ind = np.sort(ind[~idel])
+    if show:
+        if indnan.size:
+            x[indnan] = np.nan
+        if valley:
+            x = -x
+            if mph is not None:
+                mph = -mph
+        _plot(x, mph, mpd, threshold, edge, valley, ax, ind, title)
+    return ind, x[ind]
+def _plot(x, mph, mpd, threshold, edge, valley, ax, ind, title):
+    """Plot results of the detect_peaks function, see its help."""
+    try:
+        import matplotlib.pyplot as plt
+    except ImportError:
+        print('matplotlib is not available.')
+    else:
+        if ax is None:
+            _, ax = plt.subplots(1, 1, figsize=(8, 4))
+            no_ax = True
+        else:
+            no_ax = False
+        ax.plot(x, 'b', lw=1)
+        if ind.size:
+            label = 'valley' if valley else 'peak'
+            label = label + 's' if ind.size > 1 else label
+            ax.plot(ind, x[ind], '+', mfc=None, mec='r', mew=2, ms=8,
+                    label='%d %s' % (ind.size, label))
+            ax.legend(loc='best', framealpha=.5, numpoints=1)
+        ax.set_xlim(-.02*x.size, x.size*1.02-1)
+        ymin, ymax = x[np.isfinite(x)].min(), x[np.isfinite(x)].max()
+        yrange = ymax - ymin if ymax > ymin else 1
+        ax.set_ylim(ymin - 0.1*yrange, ymax + 0.1*yrange)
+        ax.set_xlabel('Data #', fontsize=14)
+        ax.set_ylabel('Amplitude', fontsize=14)
+        if title:
+            if not isinstance(title, str):
+                mode = 'Valley detection' if valley else 'Peak detection'
+                title = "%s (mph=%s, mpd=%d, threshold=%s, edge='%s')"% \
+                        (mode, str(mph), mpd, str(threshold), edge)
+            ax.set_title(title)
+        # plt.grid()
+        if no_ax:
+            plt.show()

phasenet/model.py ADDED Viewed

	@@ -0,0 +1,489 @@

+import tensorflow as tf
+tf.compat.v1.disable_eager_execution()
+import numpy as np
+import logging
+import warnings
+warnings.filterwarnings('ignore', category=UserWarning)
+class ModelConfig:
+  batch_size = 20
+  depths = 5
+  filters_root = 8
+  kernel_size = [7, 1]
+  pool_size = [4, 1]
+  dilation_rate = [1, 1]
+  class_weights = [1.0, 1.0, 1.0]
+  loss_type = "cross_entropy"
+  weight_decay = 0.0
+  optimizer = "adam"
+  momentum = 0.9
+  learning_rate = 0.01
+  decay_step = 1e9
+  decay_rate = 0.9
+  drop_rate = 0.0
+  summary = True
+  X_shape = [3000, 1, 3]
+  n_channel = X_shape[-1]
+  Y_shape = [3000, 1, 3]
+  n_class = Y_shape[-1]
+  def __init__(self, **kwargs):
+    for k,v in kwargs.items():
+      setattr(self, k, v)
+  def update_args(self, args):
+    for k,v in vars(args).items():
+      setattr(self, k, v)
+def crop_and_concat(net1, net2):
+  """
+  the size(net1) <= size(net2)
+  """
+  # net1_shape = net1.get_shape().as_list()
+  # net2_shape = net2.get_shape().as_list()
+  # # print(net1_shape)
+  # # print(net2_shape)
+  # # if net2_shape[1] >= net1_shape[1] and net2_shape[2] >= net1_shape[2]:
+  # offsets = [0, (net2_shape[1] - net1_shape[1]) // 2, (net2_shape[2] - net1_shape[2]) // 2, 0]
+  # size = [-1, net1_shape[1], net1_shape[2], -1]
+  # net2_resize = tf.slice(net2, offsets, size)
+  # return tf.concat([net1, net2_resize], 3)
+  ## dynamic shape
+  chn1 = net1.get_shape().as_list()[-1]
+  chn2 = net2.get_shape().as_list()[-1]
+  net1_shape = tf.shape(net1)
+  net2_shape = tf.shape(net2)
+  # print(net1_shape)
+  # print(net2_shape)
+  # if net2_shape[1] >= net1_shape[1] and net2_shape[2] >= net1_shape[2]:
+  offsets = [0, (net2_shape[1] - net1_shape[1]) // 2, (net2_shape[2] - net1_shape[2]) // 2, 0]
+  size = [-1, net1_shape[1], net1_shape[2], -1]
+  net2_resize = tf.slice(net2, offsets, size)
+  out = tf.concat([net1, net2_resize], 3)
+  out.set_shape([None, None, None, chn1+chn2])
+  return out
+  # else:
+  #     offsets = [0, (net1_shape[1] - net2_shape[1]) // 2, (net1_shape[2] - net2_shape[2]) // 2, 0]
+  #     size = [-1, net2_shape[1], net2_shape[2], -1]
+  #     net1_resize = tf.slice(net1, offsets, size)
+  #     return tf.concat([net1_resize, net2], 3)
+def crop_only(net1, net2):
+  """
+  the size(net1) <= size(net2)
+  """
+  net1_shape = net1.get_shape().as_list()
+  net2_shape = net2.get_shape().as_list()
+  # print(net1_shape)
+  # print(net2_shape)
+  # if net2_shape[1] >= net1_shape[1] and net2_shape[2] >= net1_shape[2]:
+  offsets = [0, (net2_shape[1] - net1_shape[1]) // 2, (net2_shape[2] - net1_shape[2]) // 2, 0]
+  size = [-1, net1_shape[1], net1_shape[2], -1]
+  net2_resize = tf.slice(net2, offsets, size)
+  #return tf.concat([net1, net2_resize], 3)
+  return net2_resize
+class UNet:
+  def __init__(self, config=ModelConfig(), input_batch=None, mode='train'):
+    self.depths = config.depths
+    self.filters_root = config.filters_root
+    self.kernel_size = config.kernel_size
+    self.dilation_rate = config.dilation_rate
+    self.pool_size = config.pool_size
+    self.X_shape = config.X_shape
+    self.Y_shape = config.Y_shape
+    self.n_channel = config.n_channel
+    self.n_class = config.n_class
+    self.class_weights = config.class_weights
+    self.batch_size = config.batch_size
+    self.loss_type = config.loss_type
+    self.weight_decay = config.weight_decay
+    self.optimizer = config.optimizer
+    self.learning_rate = config.learning_rate
+    self.decay_step = config.decay_step
+    self.decay_rate = config.decay_rate
+    self.momentum = config.momentum
+    self.global_step = tf.compat.v1.get_variable(name="global_step", initializer=0, dtype=tf.int32)
+    self.summary_train = []
+    self.summary_valid = []
+    self.build(input_batch, mode=mode)
+  def add_placeholders(self, input_batch=None, mode="train"):
+    if input_batch is None:
+      # self.X = tf.compat.v1.placeholder(dtype=tf.float32, shape=[None, self.X_shape[-3], self.X_shape[-2], self.X_shape[-1]], name='X')
+      # self.Y = tf.compat.v1.placeholder(dtype=tf.float32, shape=[None, self.Y_shape[-3], self.Y_shape[-2], self.n_class], name='y')
+      self.X = tf.compat.v1.placeholder(dtype=tf.float32, shape=[None, None, None, self.X_shape[-1]], name='X')
+      self.Y = tf.compat.v1.placeholder(dtype=tf.float32, shape=[None, None, None, self.n_class], name='y')
+    else:
+      self.X = input_batch[0]
+      if mode in ["train", "valid", "test"]:
+        self.Y = input_batch[1]
+      self.input_batch = input_batch
+    self.is_training = tf.compat.v1.placeholder(dtype=tf.bool, name="is_training")
+    # self.keep_prob = tf.compat.v1.placeholder(dtype=tf.float32, name="keep_prob")
+    self.drop_rate = tf.compat.v1.placeholder(dtype=tf.float32, name="drop_rate")
+  def add_prediction_op(self):
+    logging.info("Model: depths {depths}, filters {filters}, "
+           "filter size {kernel_size[0]}x{kernel_size[1]}, "
+           "pool size: {pool_size[0]}x{pool_size[1]}, "
+           "dilation rate: {dilation_rate[0]}x{dilation_rate[1]}".format(
+            depths=self.depths,
+            filters=self.filters_root,
+            kernel_size=self.kernel_size,
+            dilation_rate=self.dilation_rate,
+            pool_size=self.pool_size))
+    if self.weight_decay > 0:
+      weight_decay = tf.constant(self.weight_decay, dtype=tf.float32, name="weight_constant")
+      self.regularizer = tf.keras.regularizers.l2(l=0.5 * (weight_decay))
+    else:
+      self.regularizer = None
+    self.initializer = tf.compat.v1.keras.initializers.VarianceScaling(scale=1.0, mode="fan_avg", distribution="uniform")
+    # down sample layers
+    convs = [None] * self.depths # store output of each depth
+    with tf.compat.v1.variable_scope("Input"):
+      net = self.X
+      net = tf.compat.v1.layers.conv2d(net,
+                   filters=self.filters_root,
+                   kernel_size=self.kernel_size,
+                   activation=None,
+                   padding='same',
+                   dilation_rate=self.dilation_rate,
+                   kernel_initializer=self.initializer,
+                   kernel_regularizer=self.regularizer,
+                   name="input_conv")
+      net = tf.compat.v1.layers.batch_normalization(net,
+                        training=self.is_training,
+                        name="input_bn")
+      net = tf.nn.relu(net,
+               name="input_relu")
+      # net = tf.nn.dropout(net, self.keep_prob)
+      net = tf.compat.v1.layers.dropout(net,
+                  rate=self.drop_rate,
+                  training=self.is_training,
+                  name="input_dropout")
+    for depth in range(0, self.depths):
+      with tf.compat.v1.variable_scope("DownConv_%d" % depth):
+        filters = int(2**(depth) * self.filters_root)
+        net = tf.compat.v1.layers.conv2d(net,
+                     filters=filters,
+                     kernel_size=self.kernel_size,
+                     activation=None,
+                     use_bias=False,
+                     padding='same',
+                     dilation_rate=self.dilation_rate,
+                     kernel_initializer=self.initializer,
+                     kernel_regularizer=self.regularizer,
+                     name="down_conv1_{}".format(depth + 1))
+        net = tf.compat.v1.layers.batch_normalization(net,
+                          training=self.is_training,
+                          name="down_bn1_{}".format(depth + 1))
+        net = tf.nn.relu(net,
+                 name="down_relu1_{}".format(depth+1))
+        net = tf.compat.v1.layers.dropout(net,
+                    rate=self.drop_rate,
+                    training=self.is_training,
+                    name="down_dropout1_{}".format(depth + 1))
+        convs[depth] = net
+        if depth < self.depths - 1:
+          net = tf.compat.v1.layers.conv2d(net,
+                       filters=filters,
+                       kernel_size=self.kernel_size,
+                       strides=self.pool_size,
+                       activation=None,
+                       use_bias=False,
+                       padding='same',
+                       dilation_rate=self.dilation_rate,
+                       kernel_initializer=self.initializer,
+                       kernel_regularizer=self.regularizer,
+                       name="down_conv3_{}".format(depth + 1))
+          net = tf.compat.v1.layers.batch_normalization(net,
+                            training=self.is_training,
+                            name="down_bn3_{}".format(depth + 1))
+          net = tf.nn.relu(net,
+                   name="down_relu3_{}".format(depth+1))
+          net = tf.compat.v1.layers.dropout(net,
+                    rate=self.drop_rate,
+                    training=self.is_training,
+                    name="down_dropout3_{}".format(depth + 1))
+    # up layers
+    for depth in range(self.depths - 2, -1, -1):
+      with tf.compat.v1.variable_scope("UpConv_%d" % depth):
+        filters = int(2**(depth) * self.filters_root)
+        net = tf.compat.v1.layers.conv2d_transpose(net,
+                         filters=filters,
+                         kernel_size=self.kernel_size,
+                         strides=self.pool_size,
+                         activation=None,
+                         use_bias=False,
+                         padding="same",
+                         kernel_initializer=self.initializer,
+                         kernel_regularizer=self.regularizer,
+                         name="up_conv0_{}".format(depth+1))
+        net = tf.compat.v1.layers.batch_normalization(net,
+                          training=self.is_training,
+                          name="up_bn0_{}".format(depth + 1))
+        net = tf.nn.relu(net,
+                 name="up_relu0_{}".format(depth+1))
+        net = tf.compat.v1.layers.dropout(net,
+                    rate=self.drop_rate,
+                    training=self.is_training,
+                    name="up_dropout0_{}".format(depth + 1))
+        #skip connection
+        net = crop_and_concat(convs[depth], net)
+        #net = crop_only(convs[depth], net)
+        net = tf.compat.v1.layers.conv2d(net,
+                     filters=filters,
+                     kernel_size=self.kernel_size,
+                     activation=None,
+                     use_bias=False,
+                     padding='same',
+                     dilation_rate=self.dilation_rate,
+                     kernel_initializer=self.initializer,
+                     kernel_regularizer=self.regularizer,
+                     name="up_conv1_{}".format(depth + 1))
+        net = tf.compat.v1.layers.batch_normalization(net,
+                          training=self.is_training,
+                          name="up_bn1_{}".format(depth + 1))
+        net = tf.nn.relu(net,
+                 name="up_relu1_{}".format(depth + 1))
+        net = tf.compat.v1.layers.dropout(net,
+                    rate=self.drop_rate,
+                    training=self.is_training,
+                    name="up_dropout1_{}".format(depth + 1))
+    # Output Map
+    with tf.compat.v1.variable_scope("Output"):
+      net = tf.compat.v1.layers.conv2d(net,
+                   filters=self.n_class,
+                   kernel_size=(1,1),
+                   activation=None,
+                   padding='same',
+                   #dilation_rate=self.dilation_rate,
+                   kernel_initializer=self.initializer,
+                   kernel_regularizer=self.regularizer,
+                   name="output_conv")
+      # net = tf.nn.relu(net,
+      #                     name="output_relu")
+      # net = tf.compat.v1.layers.dropout(net,
+      #                         rate=self.drop_rate,
+      #                         training=self.is_training,
+      #                         name="output_dropout")
+      # net = tf.compat.v1.layers.batch_normalization(net,
+      #                                    training=self.is_training,
+      #                                    name="output_bn")
+      output = net
+    with tf.compat.v1.variable_scope("representation"):
+      self.representation = convs[-1]
+    with tf.compat.v1.variable_scope("logits"):
+      self.logits = output
+      tmp = tf.compat.v1.summary.histogram("logits", self.logits)
+      self.summary_train.append(tmp)
+    with tf.compat.v1.variable_scope("preds"):
+      self.preds = tf.nn.softmax(output)
+      tmp = tf.compat.v1.summary.histogram("preds", self.preds)
+      self.summary_train.append(tmp)
+  def add_loss_op(self):
+    if self.loss_type == "cross_entropy":
+      with tf.compat.v1.variable_scope("cross_entropy"):
+        flat_logits = tf.reshape(self.logits, [-1, self.n_class], name="logits")
+        flat_labels = tf.reshape(self.Y, [-1, self.n_class], name="labels")
+        if (np.array(self.class_weights) != 1).any():
+          class_weights = tf.constant(np.array(self.class_weights, dtype=np.float32), name="class_weights")
+          weight_map = tf.multiply(flat_labels, class_weights)
+          weight_map = tf.reduce_sum(input_tensor=weight_map, axis=1)
+          loss_map = tf.nn.softmax_cross_entropy_with_logits(logits=flat_logits,
+                                     labels=flat_labels)
+          weighted_loss = tf.multiply(loss_map, weight_map)
+          loss = tf.reduce_mean(input_tensor=weighted_loss)
+        else:
+          loss = tf.reduce_mean(input_tensor=tf.nn.softmax_cross_entropy_with_logits(logits=flat_logits,
+                                         labels=flat_labels))
+    elif self.loss_type == "IOU":
+      with tf.compat.v1.variable_scope("IOU"):
+        eps = 1e-7
+        loss = 0
+        for i in range(1, self.n_class):
+          intersection = eps + tf.reduce_sum(input_tensor=self.preds[:,:,:,i] * self.Y[:,:,:,i], axis=[1,2])
+          union = eps + tf.reduce_sum(input_tensor=self.preds[:,:,:,i], axis=[1,2]) + tf.reduce_sum(input_tensor=self.Y[:,:,:,i], axis=[1,2])
+          loss += 1 - tf.reduce_mean(input_tensor=intersection / union)
+    elif self.loss_type == "mean_squared":
+      with tf.compat.v1.variable_scope("mean_squared"):
+        flat_logits = tf.reshape(self.logits, [-1, self.n_class], name="logits")
+        flat_labels = tf.reshape(self.Y, [-1, self.n_class], name="labels")
+        with tf.compat.v1.variable_scope("mean_squared"):
+          loss = tf.compat.v1.losses.mean_squared_error(labels=flat_labels, predictions=flat_logits)
+    else:
+      raise ValueError("Unknown loss function: " % self.loss_type)
+    tmp = tf.compat.v1.summary.scalar("train_loss", loss)
+    self.summary_train.append(tmp)
+    tmp = tf.compat.v1.summary.scalar("valid_loss", loss)
+    self.summary_valid.append(tmp)
+    if self.weight_decay > 0:
+      with tf.compat.v1.name_scope('weight_loss'):
+        tmp = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.REGULARIZATION_LOSSES)
+        weight_loss = tf.add_n(tmp, name="weight_loss")
+      self.loss = loss + weight_loss
+    else:
+      self.loss = loss
+  def add_training_op(self):
+    if self.optimizer == "momentum":
+      self.learning_rate_node = tf.compat.v1.train.exponential_decay(learning_rate=self.learning_rate,
+                                 global_step=self.global_step,
+                                 decay_steps=self.decay_step,
+                                 decay_rate=self.decay_rate,
+                                 staircase=True)
+      optimizer = tf.compat.v1.train.MomentumOptimizer(learning_rate=self.learning_rate_node,
+                           momentum=self.momentum)
+    elif self.optimizer == "adam":
+      self.learning_rate_node = tf.compat.v1.train.exponential_decay(learning_rate=self.learning_rate,
+                                 global_step=self.global_step,
+                                 decay_steps=self.decay_step,
+                                 decay_rate=self.decay_rate,
+                                 staircase=True)
+      optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=self.learning_rate_node)
+    update_ops = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.UPDATE_OPS)
+    with tf.control_dependencies(update_ops):
+      self.train_op = optimizer.minimize(self.loss, global_step=self.global_step)
+    tmp = tf.compat.v1.summary.scalar("learning_rate", self.learning_rate_node)
+    self.summary_train.append(tmp)
+  def add_metrics_op(self):
+    with tf.compat.v1.variable_scope("metrics"):
+      Y= tf.argmax(input=self.Y, axis=-1)
+      confusion_matrix = tf.cast(tf.math.confusion_matrix(
+          labels=tf.reshape(Y, [-1]),
+          predictions=tf.reshape(self.preds, [-1]),
+          num_classes=self.n_class, name='confusion_matrix'),
+          dtype=tf.float32)
+      # with tf.variable_scope("P"):
+      c = tf.constant(1e-7, dtype=tf.float32)
+      precision_P =  (confusion_matrix[1,1] + c) / (tf.reduce_sum(input_tensor=confusion_matrix[:,1]) + c)
+      recall_P = (confusion_matrix[1,1] + c) / (tf.reduce_sum(input_tensor=confusion_matrix[1,:]) + c)
+      f1_P = 2 * precision_P * recall_P / (precision_P + recall_P)
+      tmp1 = tf.compat.v1.summary.scalar("train_precision_p", precision_P)
+      tmp2 = tf.compat.v1.summary.scalar("train_recall_p", recall_P)
+      tmp3 = tf.compat.v1.summary.scalar("train_f1_p", f1_P)
+      self.summary_train.extend([tmp1, tmp2, tmp3])
+      tmp1 = tf.compat.v1.summary.scalar("valid_precision_p", precision_P)
+      tmp2 = tf.compat.v1.summary.scalar("valid_recall_p", recall_P)
+      tmp3 = tf.compat.v1.summary.scalar("valid_f1_p", f1_P)
+      self.summary_valid.extend([tmp1, tmp2, tmp3])
+      # with tf.variable_scope("S"):
+      precision_S =  (confusion_matrix[2,2] + c) / (tf.reduce_sum(input_tensor=confusion_matrix[:,2]) + c)
+      recall_S = (confusion_matrix[2,2] + c) / (tf.reduce_sum(input_tensor=confusion_matrix[2,:]) + c)
+      f1_S = 2 * precision_S * recall_S / (precision_S + recall_S)
+      tmp1 = tf.compat.v1.summary.scalar("train_precision_s", precision_S)
+      tmp2 = tf.compat.v1.summary.scalar("train_recall_s", recall_S)
+      tmp3 = tf.compat.v1.summary.scalar("train_f1_s", f1_S)
+      self.summary_train.extend([tmp1, tmp2, tmp3])
+      tmp1 = tf.compat.v1.summary.scalar("valid_precision_s", precision_S)
+      tmp2 = tf.compat.v1.summary.scalar("valid_recall_s", recall_S)
+      tmp3 = tf.compat.v1.summary.scalar("valid_f1_s", f1_S)
+      self.summary_valid.extend([tmp1, tmp2, tmp3])
+      self.precision = [precision_P, precision_S]
+      self.recall = [recall_P, recall_S]
+      self.f1 = [f1_P, f1_S]
+  def train_on_batch(self, sess, inputs_batch, labels_batch, summary_writer, drop_rate=0.0):
+    feed = {self.X: inputs_batch,
+            self.Y: labels_batch,
+            self.drop_rate: drop_rate,
+            self.is_training: True}
+    _, step_summary, step, loss = sess.run([self.train_op,
+                                            self.summary_train,
+                                            self.global_step,
+                                            self.loss],
+                                            feed_dict=feed)
+    summary_writer.add_summary(step_summary, step)
+    return loss
+  def valid_on_batch(self, sess, inputs_batch, labels_batch, summary_writer):
+    feed = {self.X: inputs_batch,
+            self.Y: labels_batch,
+            self.drop_rate: 0,
+            self.is_training: False}
+    step_summary, step, loss, preds = sess.run([self.summary_valid,
+                                                self.global_step,
+                                                self.loss,
+                                                self.preds],
+                                                feed_dict=feed)
+    summary_writer.add_summary(step_summary, step)
+    return loss, preds
+  def test_on_batch(self, sess, summary_writer):
+    feed = {self.drop_rate: 0,
+            self.is_training: False}
+    step_summary, step, loss, preds, \
+    X_batch, Y_batch, fname_batch, \
+    itp_batch, its_batch = sess.run([self.summary_valid,
+                                     self.global_step,
+                                     self.loss,
+                                     self.preds,
+                                     self.X,
+                                     self.Y,
+                                     self.input_batch[2],
+                                     self.input_batch[3],
+                                     self.input_batch[4]],
+                                     feed_dict=feed)
+    summary_writer.add_summary(step_summary, step)
+    return loss, preds, X_batch, Y_batch, fname_batch, itp_batch, its_batch
+  def build(self, input_batch=None, mode='train'):
+    self.add_placeholders(input_batch, mode)
+    self.add_prediction_op()
+    if mode in ["train", "valid", "test"]:
+      self.add_loss_op()
+      self.add_training_op()
+      # self.add_metrics_op()
+      self.summary_train = tf.compat.v1.summary.merge(self.summary_train)
+      self.summary_valid = tf.compat.v1.summary.merge(self.summary_valid)
+    return 0

phasenet/postprocess.py ADDED Viewed

	@@ -0,0 +1,377 @@

+import json
+import logging
+import os
+from collections import namedtuple
+from datetime import datetime, timedelta
+import matplotlib.pyplot as plt
+import numpy as np
+from detect_peaks import detect_peaks
+# def extract_picks(preds, fnames=None, station_ids=None, t0=None, config=None):
+#     if preds.shape[-1] == 4:
+#         record = namedtuple("phase", ["fname", "station_id", "t0", "p_idx", "p_prob", "s_idx", "s_prob", "ps_idx", "ps_prob"])
+#     else:
+#         record = namedtuple("phase", ["fname", "station_id", "t0", "p_idx", "p_prob", "s_idx", "s_prob"])
+#     picks = []
+#     for i, pred in enumerate(preds):
+#         if config is None:
+#             mph_p, mph_s, mpd = 0.3, 0.3, 50
+#         else:
+#             mph_p, mph_s, mpd = config.min_p_prob, config.min_s_prob, config.mpd
+#         if (fnames is None):
+#             fname = f"{i:04d}"
+#         else:
+#             if isinstance(fnames[i], str):
+#                 fname = fnames[i]
+#             else:
+#                 fname = fnames[i].decode()
+#         if (station_ids is None):
+#             station_id = f"{i:04d}"
+#         else:
+#             if isinstance(station_ids[i], str):
+#                 station_id = station_ids[i]
+#             else:
+#                 station_id = station_ids[i].decode()
+#         if (t0 is None):
+#             start_time = "1970-01-01T00:00:00.000"
+#         else:
+#             if isinstance(t0[i], str):
+#                 start_time = t0[i]
+#             else:
+#                 start_time = t0[i].decode()
+#         p_idx, p_prob, s_idx, s_prob = [], [], [], []
+#         for j in range(pred.shape[1]):
+#             p_idx_, p_prob_ = detect_peaks(pred[:,j,1], mph=mph_p, mpd=mpd, show=False)
+#             s_idx_, s_prob_ = detect_peaks(pred[:,j,2], mph=mph_s, mpd=mpd, show=False)
+#             p_idx.append(list(p_idx_))
+#             p_prob.append(list(p_prob_))
+#             s_idx.append(list(s_idx_))
+#             s_prob.append(list(s_prob_))
+#         if pred.shape[-1] == 4:
+#             ps_idx, ps_prob = detect_peaks(pred[:,0,3], mph=0.3, mpd=mpd, show=False)
+#             picks.append(record(fname, station_id, start_time, list(p_idx), list(p_prob), list(s_idx), list(s_prob), list(ps_idx), list(ps_prob)))
+#         else:
+#             picks.append(record(fname, station_id, start_time, list(p_idx), list(p_prob), list(s_idx), list(s_prob)))
+#     return picks
+def extract_picks(
+    preds,
+    file_names=None,
+    begin_times=None,
+    station_ids=None,
+    dt=0.01,
+    phases=["P", "S"],
+    config=None,
+    waveforms=None,
+    use_amplitude=False,
+):
+    """Extract picks from prediction results.
+    Args:
+        preds ([type]): [Nb, Nt, Ns, Nc] "batch, time, station, channel"
+        file_names ([type], optional): [Nb]. Defaults to None.
+        station_ids ([type], optional): [Ns]. Defaults to None.
+        t0 ([type], optional): [Nb]. Defaults to None.
+        config ([type], optional): [description]. Defaults to None.
+    Returns:
+        picks [type]: {file_name, station_id, pick_time, pick_prob, pick_type}
+    """
+    mph = {}
+    if config is None:
+        for x in phases:
+            mph[x] = 0.3
+        mpd = 50
+        pre_idx = int(1 / dt)
+        post_idx = int(4 / dt)
+    else:
+        mph["P"] = config.min_p_prob
+        mph["S"] = config.min_s_prob
+        mph["PS"] = 0.3
+        mpd = config.mpd
+        pre_idx = int(config.pre_sec / dt)
+        post_idx = int(config.post_sec / dt)
+    Nb, Nt, Ns, Nc = preds.shape
+    if file_names is None:
+        file_names = [f"{i:04d}" for i in range(Nb)]
+    elif not (isinstance(file_names, np.ndarray) or isinstance(file_names, list)):
+        if isinstance(file_names, bytes):
+            file_names = file_names.decode()
+        file_names = [file_names] * Nb
+    else:
+        file_names = [x.decode() if isinstance(x, bytes) else x for x in file_names]
+    if begin_times is None:
+        begin_times = ["1970-01-01T00:00:00.000+00:00"] * Nb
+    else:
+        begin_times = [x.decode() if isinstance(x, bytes) else x for x in begin_times]
+    picks = []
+    for i in range(Nb):
+        file_name = file_names[i]
+        begin_time = datetime.fromisoformat(begin_times[i])
+        for j in range(Ns):
+            if (station_ids is None) or (len(station_ids[i]) == 0):
+                station_id = f"{j:04d}"
+            else:
+                station_id = station_ids[i][j].decode() if isinstance(station_ids[i][j], bytes) else station_ids[i][j]
+            if (waveforms is not None) and use_amplitude:
+                amp = np.max(np.abs(waveforms[i, :, j, :]), axis=-1)  ## amplitude over three channelspy
+            for k in range(Nc - 1):  # 0-th channel noise
+                idxs, probs = detect_peaks(preds[i, :, j, k + 1], mph=mph[phases[k]], mpd=mpd, show=False)
+                for l, (phase_index, phase_prob) in enumerate(zip(idxs, probs)):
+                    pick_time = begin_time + timedelta(seconds=phase_index * dt)
+                    pick = {
+                        "file_name": file_name,
+                        "station_id": station_id,
+                        "begin_time": begin_time.isoformat(timespec="milliseconds"),
+                        "phase_index": int(phase_index),
+                        "phase_time": pick_time.isoformat(timespec="milliseconds"),
+                        "phase_score": round(phase_prob, 3),
+                        "phase_type": phases[k],
+                        "dt": dt,
+                    }
+                    ## process waveform
+                    if waveforms is not None:
+                        tmp = np.zeros((pre_idx + post_idx, 3))
+                        lo = phase_index - pre_idx
+                        hi = phase_index + post_idx
+                        insert_idx = 0
+                        if lo < 0:
+                            lo = 0
+                            insert_idx = -lo
+                        if hi > Nt:
+                            hi = Nt
+                        tmp[insert_idx : insert_idx + hi - lo, :] = waveforms[i, lo:hi, j, :]
+                        if use_amplitude:
+                            next_pick = idxs[l + 1] if l < len(idxs) - 1 else (phase_index + post_idx * 3)
+                            pick["phase_amplitude"] = np.max(
+                                amp[phase_index : min(phase_index + post_idx * 3, next_pick)]
+                            ).item()  ## peak amplitude
+                    picks.append(pick)
+    return picks
+def extract_amplitude(data, picks, window_p=10, window_s=5, config=None):
+    record = namedtuple("amplitude", ["p_amp", "s_amp"])
+    dt = 0.01 if config is None else config.dt
+    window_p = int(window_p / dt)
+    window_s = int(window_s / dt)
+    amps = []
+    for i, (da, pi) in enumerate(zip(data, picks)):
+        p_amp, s_amp = [], []
+        for j in range(da.shape[1]):
+            amp = np.max(np.abs(da[:, j, :]), axis=-1)
+            # amp = np.median(np.abs(da[:,j,:]), axis=-1)
+            # amp = np.linalg.norm(da[:,j,:], axis=-1)
+            tmp = []
+            for k in range(len(pi.p_idx[j]) - 1):
+                tmp.append(np.max(amp[pi.p_idx[j][k] : min(pi.p_idx[j][k] + window_p, pi.p_idx[j][k + 1])]))
+            if len(pi.p_idx[j]) >= 1:
+                tmp.append(np.max(amp[pi.p_idx[j][-1] : pi.p_idx[j][-1] + window_p]))
+            p_amp.append(tmp)
+            tmp = []
+            for k in range(len(pi.s_idx[j]) - 1):
+                tmp.append(np.max(amp[pi.s_idx[j][k] : min(pi.s_idx[j][k] + window_s, pi.s_idx[j][k + 1])]))
+            if len(pi.s_idx[j]) >= 1:
+                tmp.append(np.max(amp[pi.s_idx[j][-1] : pi.s_idx[j][-1] + window_s]))
+            s_amp.append(tmp)
+        amps.append(record(p_amp, s_amp))
+    return amps
+def save_picks(picks, output_dir, amps=None, fname=None):
+    if fname is None:
+        fname = "picks.csv"
+    int2s = lambda x: ",".join(["[" + ",".join(map(str, i)) + "]" for i in x])
+    flt2s = lambda x: ",".join(["[" + ",".join(map("{:0.3f}".format, i)) + "]" for i in x])
+    sci2s = lambda x: ",".join(["[" + ",".join(map("{:0.3e}".format, i)) + "]" for i in x])
+    if amps is None:
+        if hasattr(picks[0], "ps_idx"):
+            with open(os.path.join(output_dir, fname), "w") as fp:
+                fp.write("fname\tt0\tp_idx\tp_prob\ts_idx\ts_prob\tps_idx\tps_prob\n")
+                for pick in picks:
+                    fp.write(
+                        f"{pick.fname}\t{pick.t0}\t{int2s(pick.p_idx)}\t{flt2s(pick.p_prob)}\t{int2s(pick.s_idx)}\t{flt2s(pick.s_prob)}\t{int2s(pick.ps_idx)}\t{flt2s(pick.ps_prob)}\n"
+                    )
+                fp.close()
+        else:
+            with open(os.path.join(output_dir, fname), "w") as fp:
+                fp.write("fname\tt0\tp_idx\tp_prob\ts_idx\ts_prob\n")
+                for pick in picks:
+                    fp.write(
+                        f"{pick.fname}\t{pick.t0}\t{int2s(pick.p_idx)}\t{flt2s(pick.p_prob)}\t{int2s(pick.s_idx)}\t{flt2s(pick.s_prob)}\n"
+                    )
+                fp.close()
+    else:
+        with open(os.path.join(output_dir, fname), "w") as fp:
+            fp.write("fname\tt0\tp_idx\tp_prob\ts_idx\ts_prob\tp_amp\ts_amp\n")
+            for pick, amp in zip(picks, amps):
+                fp.write(
+                    f"{pick.fname}\t{pick.t0}\t{int2s(pick.p_idx)}\t{flt2s(pick.p_prob)}\t{int2s(pick.s_idx)}\t{flt2s(pick.s_prob)}\t{sci2s(amp.p_amp)}\t{sci2s(amp.s_amp)}\n"
+                )
+            fp.close()
+    return 0
+def calc_timestamp(timestamp, sec):
+    timestamp = datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S.%f") + timedelta(seconds=sec)
+    return timestamp.strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3]
+def save_picks_json(picks, output_dir, dt=0.01, amps=None, fname=None):
+    if fname is None:
+        fname = "picks.json"
+    picks_ = []
+    if amps is None:
+        for pick in picks:
+            for idxs, probs in zip(pick.p_idx, pick.p_prob):
+                for idx, prob in zip(idxs, probs):
+                    picks_.append(
+                        {
+                            "id": pick.station_id,
+                            "timestamp": calc_timestamp(pick.t0, float(idx) * dt),
+                            "prob": prob.astype(float),
+                            "type": "p",
+                        }
+                    )
+            for idxs, probs in zip(pick.s_idx, pick.s_prob):
+                for idx, prob in zip(idxs, probs):
+                    picks_.append(
+                        {
+                            "id": pick.station_id,
+                            "timestamp": calc_timestamp(pick.t0, float(idx) * dt),
+                            "prob": prob.astype(float),
+                            "type": "s",
+                        }
+                    )
+    else:
+        for pick, amplitude in zip(picks, amps):
+            for idxs, probs, amps in zip(pick.p_idx, pick.p_prob, amplitude.p_amp):
+                for idx, prob, amp in zip(idxs, probs, amps):
+                    picks_.append(
+                        {
+                            "id": pick.station_id,
+                            "timestamp": calc_timestamp(pick.t0, float(idx) * dt),
+                            "prob": prob.astype(float),
+                            "amp": amp.astype(float),
+                            "type": "p",
+                        }
+                    )
+            for idxs, probs, amps in zip(pick.s_idx, pick.s_prob, amplitude.s_amp):
+                for idx, prob, amp in zip(idxs, probs, amps):
+                    picks_.append(
+                        {
+                            "id": pick.station_id,
+                            "timestamp": calc_timestamp(pick.t0, float(idx) * dt),
+                            "prob": prob.astype(float),
+                            "amp": amp.astype(float),
+                            "type": "s",
+                        }
+                    )
+    with open(os.path.join(output_dir, fname), "w") as fp:
+        json.dump(picks_, fp)
+    return 0
+def convert_true_picks(fname, itp, its, itps=None):
+    true_picks = []
+    if itps is None:
+        record = namedtuple("phase", ["fname", "p_idx", "s_idx"])
+        for i in range(len(fname)):
+            true_picks.append(record(fname[i].decode(), itp[i], its[i]))
+    else:
+        record = namedtuple("phase", ["fname", "p_idx", "s_idx", "ps_idx"])
+        for i in range(len(fname)):
+            true_picks.append(record(fname[i].decode(), itp[i], its[i], itps[i]))
+    return true_picks
+def calc_metrics(nTP, nP, nT):
+    """
+    nTP: true positive
+    nP: number of positive picks
+    nT: number of true picks
+    """
+    precision = nTP / nP
+    recall = nTP / nT
+    f1 = 2 * precision * recall / (precision + recall)
+    return [precision, recall, f1]
+def calc_performance(picks, true_picks, tol=3.0, dt=1.0):
+    assert len(picks) == len(true_picks)
+    logging.info("Total records: {}".format(len(picks)))
+    count = lambda picks: sum([len(x) for x in picks])
+    metrics = {}
+    for phase in true_picks[0]._fields:
+        if phase == "fname":
+            continue
+        true_positive, positive, true = 0, 0, 0
+        residual = []
+        for i in range(len(true_picks)):
+            true += count(getattr(true_picks[i], phase))
+            positive += count(getattr(picks[i], phase))
+            # print(i, phase, getattr(picks[i], phase), getattr(true_picks[i], phase))
+            diff = dt * (
+                np.array(getattr(picks[i], phase))[:, np.newaxis, :]
+                - np.array(getattr(true_picks[i], phase))[:, :, np.newaxis]
+            )
+            residual.extend(list(diff[np.abs(diff) <= tol]))
+            true_positive += np.sum(np.abs(diff) <= tol)
+        metrics[phase] = calc_metrics(true_positive, positive, true)
+        logging.info(f"{phase}-phase:")
+        logging.info(f"True={true}, Positive={positive}, True Positive={true_positive}")
+        logging.info(f"Precision={metrics[phase][0]:.3f}, Recall={metrics[phase][1]:.3f}, F1={metrics[phase][2]:.3f}")
+        logging.info(f"Residual mean={np.mean(residual):.4f}, std={np.std(residual):.4f}")
+    return metrics
+def save_prob_h5(probs, fnames, output_h5):
+    if fnames is None:
+        fnames = [f"{i:04d}" for i in range(len(probs))]
+    elif type(fnames[0]) is bytes:
+        fnames = [f.decode().rstrip(".npz") for f in fnames]
+    else:
+        fnames = [f.rstrip(".npz") for f in fnames]
+    for prob, fname in zip(probs, fnames):
+        output_h5.create_dataset(fname, data=prob, dtype="float32")
+    return 0
+def save_prob(probs, fnames, prob_dir):
+    if fnames is None:
+        fnames = [f"{i:04d}" for i in range(len(probs))]
+    elif type(fnames[0]) is bytes:
+        fnames = [f.decode().rstrip(".npz") for f in fnames]
+    else:
+        fnames = [f.rstrip(".npz") for f in fnames]
+    for prob, fname in zip(probs, fnames):
+        np.savez(os.path.join(prob_dir, fname + ".npz"), prob=prob)
+    return 0

phasenet/predict.py ADDED Viewed

	@@ -0,0 +1,262 @@

+import argparse
+import logging
+import multiprocessing
+import os
+import pickle
+import time
+from functools import partial
+import h5py
+import numpy as np
+import pandas as pd
+import tensorflow as tf
+from data_reader import DataReader_mseed_array, DataReader_pred
+from postprocess import (
+    extract_amplitude,
+    extract_picks,
+    save_picks,
+    save_picks_json,
+    save_prob_h5,
+)
+from tqdm import tqdm
+from visulization import plot_waveform
+from model import ModelConfig, UNet
+tf.compat.v1.disable_eager_execution()
+tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
+def read_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--batch_size", default=20, type=int, help="batch size")
+    parser.add_argument("--model_dir", help="Checkpoint directory (default: None)")
+    parser.add_argument("--data_dir", default="", help="Input file directory")
+    parser.add_argument("--data_list", default="", help="Input csv file")
+    parser.add_argument("--hdf5_file", default="", help="Input hdf5 file")
+    parser.add_argument("--hdf5_group", default="data", help="data group name in hdf5 file")
+    parser.add_argument("--result_dir", default="results", help="Output directory")
+    parser.add_argument("--result_fname", default="picks", help="Output file")
+    parser.add_argument("--min_p_prob", default=0.3, type=float, help="Probability threshold for P pick")
+    parser.add_argument("--min_s_prob", default=0.3, type=float, help="Probability threshold for S pick")
+    parser.add_argument("--mpd", default=50, type=float, help="Minimum peak distance")
+    parser.add_argument("--amplitude", action="store_true", help="if return amplitude value")
+    parser.add_argument("--format", default="numpy", help="input format")
+    parser.add_argument("--s3_url", default="localhost:9000", help="s3 url")
+    parser.add_argument("--stations", default="", help="seismic station info")
+    parser.add_argument("--plot_figure", action="store_true", help="If plot figure for test")
+    parser.add_argument("--save_prob", action="store_true", help="If save result for test")
+    parser.add_argument("--pre_sec", default=1, type=float, help="Window length before pick")
+    parser.add_argument("--post_sec", default=4, type=float, help="Window length after pick")
+    parser.add_argument("--highpass_filter", default=0.0, type=float, help="Highpass filter")
+    parser.add_argument("--response_xml", default=None, type=str, help="response xml file")
+    parser.add_argument("--sampling_rate", default=100, type=float, help="sampling rate")
+    args = parser.parse_args()
+    return args
+def pred_fn(args, data_reader, figure_dir=None, prob_dir=None, log_dir=None):
+    current_time = time.strftime("%y%m%d-%H%M%S")
+    if log_dir is None:
+        log_dir = os.path.join(args.log_dir, "pred", current_time)
+    if not os.path.exists(log_dir):
+        os.makedirs(log_dir)
+    if (args.plot_figure == True) and (figure_dir is None):
+        figure_dir = os.path.join(log_dir, "figures")
+        if not os.path.exists(figure_dir):
+            os.makedirs(figure_dir)
+    if (args.save_prob == True) and (prob_dir is None):
+        prob_dir = os.path.join(log_dir, "probs")
+        if not os.path.exists(prob_dir):
+            os.makedirs(prob_dir)
+    if args.save_prob:
+        h5 = h5py.File(os.path.join(args.result_dir, "result.h5"), "w", libver="latest")
+        prob_h5 = h5.create_group("/prob")
+    logging.info("Pred log: %s" % log_dir)
+    logging.info("Dataset size: {}".format(data_reader.num_data))
+    with tf.compat.v1.name_scope("Input_Batch"):
+        if args.format == "mseed_array":
+            batch_size = 1
+        else:
+            batch_size = args.batch_size
+        dataset = data_reader.dataset(batch_size)
+        batch = tf.compat.v1.data.make_one_shot_iterator(dataset).get_next()
+    config = ModelConfig(X_shape=data_reader.X_shape)
+    with open(os.path.join(log_dir, "config.log"), "w") as fp:
+        fp.write("\n".join("%s: %s" % item for item in vars(config).items()))
+    model = UNet(config=config, input_batch=batch, mode="pred")
+    # model = UNet(config=config, mode="pred")
+    sess_config = tf.compat.v1.ConfigProto()
+    sess_config.gpu_options.allow_growth = True
+    # sess_config.log_device_placement = False
+    with tf.compat.v1.Session(config=sess_config) as sess:
+        saver = tf.compat.v1.train.Saver(tf.compat.v1.global_variables(), max_to_keep=5)
+        init = tf.compat.v1.global_variables_initializer()
+        sess.run(init)
+        latest_check_point = tf.train.latest_checkpoint(args.model_dir)
+        logging.info(f"restoring model {latest_check_point}")
+        saver.restore(sess, latest_check_point)
+        picks = []
+        amps = [] if args.amplitude else None
+        if args.plot_figure:
+            multiprocessing.set_start_method("spawn")
+            pool = multiprocessing.Pool(multiprocessing.cpu_count())
+        for _ in tqdm(range(0, data_reader.num_data, batch_size), desc="Pred"):
+            if args.amplitude:
+                pred_batch, X_batch, amp_batch, fname_batch, t0_batch, station_batch = sess.run(
+                    [model.preds, batch[0], batch[1], batch[2], batch[3], batch[4]],
+                    feed_dict={model.drop_rate: 0, model.is_training: False},
+                )
+            #    X_batch, amp_batch, fname_batch, t0_batch = sess.run([batch[0], batch[1], batch[2], batch[3]])
+            else:
+                pred_batch, X_batch, fname_batch, t0_batch, station_batch = sess.run(
+                    [model.preds, batch[0], batch[1], batch[2], batch[3]],
+                    feed_dict={model.drop_rate: 0, model.is_training: False},
+                )
+            #    X_batch, fname_batch, t0_batch = sess.run([model.preds, batch[0], batch[1], batch[2]])
+            # pred_batch = []
+            # for i in range(0, len(X_batch), 1):
+            #     pred_batch.append(sess.run(model.preds, feed_dict={model.X: X_batch[i:i+1], model.drop_rate: 0, model.is_training: False}))
+            # pred_batch = np.vstack(pred_batch)
+            waveforms = None
+            if args.amplitude:
+                waveforms = amp_batch
+            picks_ = extract_picks(
+                preds=pred_batch,
+                file_names=fname_batch,
+                station_ids=station_batch,
+                begin_times=t0_batch,
+                config=args,
+                waveforms=waveforms,
+                use_amplitude=args.amplitude,
+                dt=1.0 / args.sampling_rate,
+            )
+            picks.extend(picks_)
+            ## save pick per file
+            if len(fname_batch) == 1:
+                df = pd.DataFrame(picks_)
+                df = df[df["phase_index"] > 10]
+                if not os.path.exists(os.path.join(args.result_dir, "picks")):
+                    os.makedirs(os.path.join(args.result_dir, "picks"))
+                df = df[
+                    [
+                        "station_id",
+                        "begin_time",
+                        "phase_index",
+                        "phase_time",
+                        "phase_score",
+                        "phase_type",
+                        "phase_amplitude",
+                        "dt",
+                    ]
+                ]
+                df.to_csv(
+                    os.path.join(
+                        args.result_dir, "picks", fname_batch[0].decode().split("/")[-1].rstrip(".mseed") + ".csv"
+                    ),
+                    index=False,
+                )
+            if args.plot_figure:
+                if not (isinstance(fname_batch, np.ndarray) or isinstance(fname_batch, list)):
+                    fname_batch = [fname_batch.decode().rstrip(".mseed") + "_" + x.decode() for x in station_batch]
+                else:
+                    fname_batch = [x.decode() for x in fname_batch]
+                pool.starmap(
+                    partial(
+                        plot_waveform,
+                        figure_dir=figure_dir,
+                    ),
+                    # zip(X_batch, pred_batch, [x.decode() for x in fname_batch]),
+                    zip(X_batch, pred_batch, fname_batch),
+                )
+            if args.save_prob:
+                # save_prob(pred_batch, fname_batch, prob_dir=prob_dir)
+                if not (isinstance(fname_batch, np.ndarray) or isinstance(fname_batch, list)):
+                    fname_batch = [fname_batch.decode().rstrip(".mseed") + "_" + x.decode() for x in station_batch]
+                else:
+                    fname_batch = [x.decode() for x in fname_batch]
+                save_prob_h5(pred_batch, fname_batch, prob_h5)
+        if len(picks) > 0:
+            # save_picks(picks, args.result_dir, amps=amps, fname=args.result_fname+".csv")
+            # save_picks_json(picks, args.result_dir, dt=data_reader.dt, amps=amps, fname=args.result_fname+".json")
+            df = pd.DataFrame(picks)
+            # df["fname"] = df["file_name"]
+            # df["id"] = df["station_id"]
+            # df["timestamp"] = df["phase_time"]
+            # df["prob"] = df["phase_prob"]
+            # df["type"] = df["phase_type"]
+            base_columns = [
+                "station_id",
+                "begin_time",
+                "phase_index",
+                "phase_time",
+                "phase_score",
+                "phase_type",
+                "file_name",
+            ]
+            if args.amplitude:
+                base_columns.append("phase_amplitude")
+                base_columns.append("phase_amp")
+                df["phase_amp"] = df["phase_amplitude"]
+            df = df[base_columns]
+            df.to_csv(os.path.join(args.result_dir, args.result_fname + ".csv"), index=False)
+            print(
+                f"Done with {len(df[df['phase_type'] == 'P'])} P-picks and {len(df[df['phase_type'] == 'S'])} S-picks"
+            )
+        else:
+            print(f"Done with 0 P-picks and 0 S-picks")
+    return 0
+def main(args):
+    logging.basicConfig(format="%(asctime)s %(message)s", level=logging.INFO)
+    with tf.compat.v1.name_scope("create_inputs"):
+        if args.format == "mseed_array":
+            data_reader = DataReader_mseed_array(
+                data_dir=args.data_dir,
+                data_list=args.data_list,
+                stations=args.stations,
+                amplitude=args.amplitude,
+                highpass_filter=args.highpass_filter,
+            )
+        else:
+            data_reader = DataReader_pred(
+                format=args.format,
+                data_dir=args.data_dir,
+                data_list=args.data_list,
+                hdf5_file=args.hdf5_file,
+                hdf5_group=args.hdf5_group,
+                amplitude=args.amplitude,
+                highpass_filter=args.highpass_filter,
+                response_xml=args.response_xml,
+                sampling_rate=args.sampling_rate,
+            )
+        pred_fn(args, data_reader, log_dir=args.result_dir)
+    return
+if __name__ == "__main__":
+    args = read_args()
+    main(args)

phasenet/slide_window.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import os
+from collections import defaultdict, namedtuple
+from datetime import datetime, timedelta
+from json import dumps
+import numpy as np
+import tensorflow as tf
+from model import ModelConfig, UNet
+from postprocess import extract_amplitude, extract_picks
+import pandas as pd
+import obspy
+tf.compat.v1.disable_eager_execution()
+tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
+PROJECT_ROOT = os.path.realpath(os.path.join(os.path.dirname(__file__), ".."))
+# load model
+model = UNet(mode="pred")
+sess_config = tf.compat.v1.ConfigProto()
+sess_config.gpu_options.allow_growth = True
+sess = tf.compat.v1.Session(config=sess_config)
+saver = tf.compat.v1.train.Saver(tf.compat.v1.global_variables())
+init = tf.compat.v1.global_variables_initializer()
+sess.run(init)
+latest_check_point = tf.train.latest_checkpoint(f"{PROJECT_ROOT}/model/190703-214543")
+print(f"restoring model {latest_check_point}")
+saver.restore(sess, latest_check_point)
+def calc_timestamp(timestamp, sec):
+    timestamp = datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S.%f") + timedelta(seconds=sec)
+    return timestamp.strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3]
+def format_picks(picks, dt):
+    picks_ = []
+    for pick in picks:
+        for idxs, probs in zip(pick.p_idx, pick.p_prob):
+            for idx, prob in zip(idxs, probs):
+                picks_.append(
+                    {
+                        "id": pick.fname,
+                        "timestamp": calc_timestamp(pick.t0, float(idx) * dt),
+                        "prob": prob,
+                        "type": "p",
+                    }
+                )
+        for idxs, probs in zip(pick.s_idx, pick.s_prob):
+            for idx, prob in zip(idxs, probs):
+                picks_.append(
+                    {
+                        "id": pick.fname,
+                        "timestamp": calc_timestamp(pick.t0, float(idx) * dt),
+                        "prob": prob,
+                        "type": "s",
+                    }
+                )
+    return picks_
+stream = obspy.read()
+stream = stream.sort() ## Assume it is NPZ sorted
+assert(len(stream) == 3)
+data = []
+for trace in stream:
+    data.append(trace.data)
+data = np.array(data).T
+assert(data.shape[-1] == 3)
+# data_id = stream[0].get_id()[:-1]
+# timestamp = stream[0].stats.starttime.datetime.strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3]
+data = np.stack([data for i in range(10)]) ## Assume 10 windows
+data = data[:,:,np.newaxis,:] ## batch, nt, dummy_dim, channel
+print(f"{data.shape = }")
+data = (data - data.mean(axis=1, keepdims=True))/data.std(axis=1, keepdims=True)
+feed = {model.X: data, model.drop_rate: 0, model.is_training: False}
+preds = sess.run(model.preds, feed_dict=feed)
+picks = extract_picks(preds, fnames=None, station_ids=None, t0=None)
+picks = format_picks(picks, dt=0.01)
+picks = pd.DataFrame(picks)
+print(picks)

phasenet/test_app.py ADDED Viewed

	@@ -0,0 +1,47 @@

+import requests
+import obspy
+import numpy as np
+import matplotlib.pyplot as plt
+from datetime import datetime
+### Start running the model first:
+### FLASK_ENV=development FLASK_APP=app.py flask run
+def read_data(mseed):
+    data = []
+    mseed = mseed.sort()
+    for c in ["E", "N", "Z"]:
+        data.append(mseed.select(channel="*"+c)[0].data)
+    return np.array(data).T
+timestamp = lambda x: x.strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3]
+## prepare some test data
+mseed = obspy.read()
+data = []
+for i in range(1):
+    data.append(read_data(mseed))
+data = {
+    "id": ["test01"],
+    "timestamp": [timestamp(datetime.now())],
+    "vec": np.array(data).tolist(),
+    "dt": 0.01
+    }
+## run prediction
+print(data["id"])
+resp = requests.get("http://localhost:8000/predict", json=data)
+# picks = resp.json()["picks"]
+print(resp.json())
+## plot figure
+plt.figure()
+plt.plot(np.array(data["data"])[0,:,1])
+ylim = plt.ylim()
+plt.plot([picks[0][0][0], picks[0][0][0]], ylim, label="P-phase")
+plt.text(picks[0][0][0], ylim[1]*0.9, f"{picks[0][1][0]:.2f}")
+plt.plot([picks[0][2][0], picks[0][2][0]], ylim, label="S-phase")
+plt.text(picks[0][2][0], ylim[1]*0.9, f"{picks[0][1][0]:.2f}")
+plt.legend()
+plt.savefig("test.png")

phasenet/train.py ADDED Viewed

	@@ -0,0 +1,246 @@

+import numpy as np
+import tensorflow as tf
+tf.compat.v1.disable_eager_execution()
+tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
+import argparse, os, time, logging
+from tqdm import tqdm
+import pandas as pd
+import multiprocessing
+from functools import partial
+import pickle
+from model import UNet, ModelConfig
+from data_reader import DataReader_train, DataReader_test
+from postprocess import extract_picks, save_picks, save_picks_json, extract_amplitude, convert_true_picks, calc_performance
+from visulization import plot_waveform
+from util import EMA, LMA
+def read_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--mode", default="train", help="train/train_valid/test/debug")
+    parser.add_argument("--epochs", default=100, type=int, help="number of epochs (default: 10)")
+    parser.add_argument("--batch_size", default=20, type=int, help="batch size")
+    parser.add_argument("--learning_rate", default=0.01, type=float, help="learning rate")
+    parser.add_argument("--drop_rate", default=0.0, type=float, help="dropout rate")
+    parser.add_argument("--decay_step", default=-1, type=int, help="decay step")
+    parser.add_argument("--decay_rate", default=0.9, type=float, help="decay rate")
+    parser.add_argument("--momentum", default=0.9, type=float, help="momentum")
+    parser.add_argument("--optimizer", default="adam", help="optimizer: adam, momentum")
+    parser.add_argument("--summary", default=True, type=bool, help="summary")
+    parser.add_argument("--class_weights", nargs="+", default=[1, 1, 1], type=float, help="class weights")
+    parser.add_argument("--model_dir", default=None, help="Checkpoint directory (default: None)")
+    parser.add_argument("--load_model", action="store_true", help="Load checkpoint")
+    parser.add_argument("--log_dir", default="log", help="Log directory (default: log)")
+    parser.add_argument("--num_plots", default=10, type=int, help="Plotting training results")
+    parser.add_argument("--min_p_prob", default=0.3, type=float, help="Probability threshold for P pick")
+    parser.add_argument("--min_s_prob", default=0.3, type=float, help="Probability threshold for S pick")
+    parser.add_argument("--format", default="numpy", help="Input data format")
+    parser.add_argument("--train_dir", default="./dataset/waveform_train/", help="Input file directory")
+    parser.add_argument("--train_list", default="./dataset/waveform.csv", help="Input csv file")
+    parser.add_argument("--valid_dir", default=None, help="Input file directory")
+    parser.add_argument("--valid_list", default=None, help="Input csv file")
+    parser.add_argument("--test_dir", default=None, help="Input file directory")
+    parser.add_argument("--test_list", default=None, help="Input csv file")
+    parser.add_argument("--result_dir", default="results", help="result directory")
+    parser.add_argument("--plot_figure", action="store_true", help="If plot figure for test")
+    parser.add_argument("--save_prob", action="store_true", help="If save result for test")
+    args = parser.parse_args()
+    return args
+def train_fn(args, data_reader, data_reader_valid=None):
+    current_time = time.strftime("%y%m%d-%H%M%S")
+    log_dir = os.path.join(args.log_dir, current_time)
+    if not os.path.exists(log_dir):
+        os.makedirs(log_dir)
+    logging.info("Training log: {}".format(log_dir))
+    model_dir = os.path.join(log_dir, 'models')
+    os.makedirs(model_dir)
+    figure_dir = os.path.join(log_dir, 'figures')
+    if not os.path.exists(figure_dir):
+        os.makedirs(figure_dir)
+    config = ModelConfig(X_shape=data_reader.X_shape, Y_shape=data_reader.Y_shape)
+    if args.decay_step == -1:
+        args.decay_step = data_reader.num_data // args.batch_size
+    config.update_args(args)
+    with open(os.path.join(log_dir, 'config.log'), 'w') as fp:
+        fp.write('\n'.join("%s: %s" % item for item in vars(config).items()))
+    with tf.compat.v1.name_scope('Input_Batch'):
+        dataset = data_reader.dataset(args.batch_size, shuffle=True).repeat()
+        batch = tf.compat.v1.data.make_one_shot_iterator(dataset).get_next()
+        if data_reader_valid is not None:
+            dataset_valid = data_reader_valid.dataset(args.batch_size, shuffle=False).repeat()
+            valid_batch = tf.compat.v1.data.make_one_shot_iterator(dataset_valid).get_next()
+    model = UNet(config, input_batch=batch)
+    sess_config = tf.compat.v1.ConfigProto()
+    sess_config.gpu_options.allow_growth = True
+    # sess_config.log_device_placement = False
+    with tf.compat.v1.Session(config=sess_config) as sess:
+        summary_writer = tf.compat.v1.summary.FileWriter(log_dir, sess.graph)
+        saver = tf.compat.v1.train.Saver(tf.compat.v1.global_variables(), max_to_keep=5)
+        init = tf.compat.v1.global_variables_initializer()
+        sess.run(init)
+        if args.model_dir is not None:
+            logging.info("restoring models...")
+            latest_check_point = tf.train.latest_checkpoint(args.model_dir)
+            saver.restore(sess, latest_check_point)
+        if args.plot_figure:
+            multiprocessing.set_start_method('spawn')
+            pool = multiprocessing.Pool(multiprocessing.cpu_count())
+        flog = open(os.path.join(log_dir, 'loss.log'), 'w')
+        train_loss = EMA(0.9)
+        best_valid_loss = np.inf
+        for epoch in range(args.epochs):
+            progressbar = tqdm(range(0, data_reader.num_data, args.batch_size), desc="{}: epoch {}".format(log_dir.split("/")[-1], epoch))
+            for _ in progressbar:
+                loss_batch, _, _ = sess.run([model.loss, model.train_op, model.global_step],
+                                            feed_dict={model.drop_rate: args.drop_rate, model.is_training: True})
+                train_loss(loss_batch)
+                progressbar.set_description("{}: epoch {}, loss={:.6f}, mean={:.6f}".format(log_dir.split("/")[-1], epoch, loss_batch, train_loss.value))
+            flog.write("epoch: {}, mean loss: {}\n".format(epoch, train_loss.value))
+            if data_reader_valid is not None:
+                valid_loss = LMA()
+                progressbar = tqdm(range(0, data_reader_valid.num_data, args.batch_size), desc="Valid:")
+                for _ in progressbar:
+                    loss_batch, preds_batch, X_batch, Y_batch, fname_batch = sess.run([model.loss, model.preds, valid_batch[0], valid_batch[1], valid_batch[2]],
+                                                                                       feed_dict={model.drop_rate: 0, model.is_training: False})
+                    valid_loss(loss_batch)
+                    progressbar.set_description("valid, loss={:.6f}, mean={:.6f}".format(loss_batch, valid_loss.value))
+                if valid_loss.value < best_valid_loss:
+                    best_valid_loss = valid_loss.value
+                    saver.save(sess, os.path.join(model_dir, "model_{}.ckpt".format(epoch)))
+                flog.write("Valid: mean loss: {}\n".format(valid_loss.value))
+            else:
+                loss_batch, preds_batch, X_batch, Y_batch, fname_batch = sess.run([model.loss, model.preds, batch[0], batch[1], batch[2]],
+                                                                                   feed_dict={model.drop_rate: 0, model.is_training: False})
+                saver.save(sess, os.path.join(model_dir, "model_{}.ckpt".format(epoch)))
+            if args.plot_figure:
+                pool.starmap(
+                    partial(
+                        plot_waveform,
+                        figure_dir=figure_dir,
+                    ),
+                    zip(X_batch, preds_batch, [x.decode() for x in fname_batch], Y_batch),
+                )
+            # plot_waveform(X_batch, preds_batch, fname_batch, label=Y_batch, figure_dir=figure_dir)
+            flog.flush()
+        flog.close()
+    return 0
+def test_fn(args, data_reader):
+    current_time = time.strftime("%y%m%d-%H%M%S")
+    logging.info("{} log: {}".format(args.mode, current_time))
+    if args.model_dir is None:
+        logging.error(f"model_dir = None!")
+        return -1
+    if not os.path.exists(args.result_dir):
+        os.makedirs(args.result_dir)
+    figure_dir=os.path.join(args.result_dir, "figures")
+    if not os.path.exists(figure_dir):
+        os.makedirs(figure_dir)
+    config = ModelConfig(X_shape=data_reader.X_shape, Y_shape=data_reader.Y_shape)
+    config.update_args(args)
+    with open(os.path.join(args.result_dir, 'config.log'), 'w') as fp:
+        fp.write('\n'.join("%s: %s" % item for item in vars(config).items()))
+    with tf.compat.v1.name_scope('Input_Batch'):
+        dataset = data_reader.dataset(args.batch_size, shuffle=False)
+        batch = tf.compat.v1.data.make_one_shot_iterator(dataset).get_next()
+    model = UNet(config, input_batch=batch, mode='test')
+    sess_config = tf.compat.v1.ConfigProto()
+    sess_config.gpu_options.allow_growth = True
+    # sess_config.log_device_placement = False
+    with tf.compat.v1.Session(config=sess_config) as sess:
+        saver = tf.compat.v1.train.Saver(tf.compat.v1.global_variables())
+        init = tf.compat.v1.global_variables_initializer()
+        sess.run(init)
+        logging.info("restoring models...")
+        latest_check_point = tf.train.latest_checkpoint(args.model_dir)
+        if latest_check_point is None:
+            logging.error(f"No models found in model_dir: {args.model_dir}")
+            return -1
+        saver.restore(sess, latest_check_point)
+        flog = open(os.path.join(args.result_dir, 'loss.log'), 'w')
+        test_loss = LMA()
+        progressbar = tqdm(range(0, data_reader.num_data, args.batch_size), desc=args.mode)
+        picks = []
+        true_picks = []
+        for _ in progressbar:
+            loss_batch, preds_batch, X_batch, Y_batch, fname_batch, itp_batch, its_batch \
+                = sess.run([model.loss, model.preds, batch[0], batch[1], batch[2], batch[3], batch[4]],
+                           feed_dict={model.drop_rate: 0, model.is_training: False})
+            test_loss(loss_batch)
+            progressbar.set_description("{}, loss={:.6f}, mean loss={:6f}".format(args.mode, loss_batch, test_loss.value))
+            picks_ = extract_picks(preds_batch, fname_batch)
+            picks.extend(picks_)
+            true_picks.extend(convert_true_picks(fname_batch, itp_batch, its_batch))
+            if args.plot_figure:
+                plot_waveform(data_reader.config, X_batch, preds_batch, label=Y_batch, fname=fname_batch,
+                              itp=itp_batch, its=its_batch, figure_dir=figure_dir)
+        save_picks(picks, args.result_dir)
+        metrics = calc_performance(picks, true_picks, tol=3.0, dt=data_reader.config.dt)
+        flog.write("mean loss: {}\n".format(test_loss))
+        flog.close()
+    return 0
+def main(args):
+    logging.basicConfig(format='%(asctime)s %(message)s', level=logging.INFO)
+    coord = tf.train.Coordinator()
+    if (args.mode == "train") or (args.mode == "train_valid"):
+        with tf.compat.v1.name_scope('create_inputs'):
+            data_reader = DataReader_train(format=args.format,
+                                           data_dir=args.train_dir,
+                                           data_list=args.train_list)
+            if args.mode == "train_valid":
+                data_reader_valid = DataReader_train(format=args.format,
+                                                     data_dir=args.valid_dir,
+                                                     data_list=args.valid_list)
+                logging.info("Dataset size: train {}, valid {}".format(data_reader.num_data, data_reader_valid.num_data))
+            else:
+                data_reader_valid = None
+                logging.info("Dataset size: train {}".format(data_reader.num_data))
+        train_fn(args, data_reader, data_reader_valid)
+    elif args.mode == "test":
+        with tf.compat.v1.name_scope('create_inputs'):
+            data_reader = DataReader_test(format=args.format,
+                                          data_dir=args.test_dir,
+                                          data_list=args.test_list)
+        test_fn(args, data_reader)
+    else:
+        print("mode should be: train, train_valid, or test")
+    return
+if __name__ == '__main__':
+    args = read_args()
+    main(args)

phasenet/util.py ADDED Viewed

	@@ -0,0 +1,238 @@

+from __future__ import division
+import matplotlib
+matplotlib.use('agg')
+import matplotlib.pyplot as plt
+import numpy as np
+import os
+from data_reader import DataConfig
+from detect_peaks import detect_peaks
+import logging
+class EMA(object):
+    def __init__(self, alpha):
+        self.alpha = alpha
+        self.x = 0.
+        self.count = 0
+    @property
+    def value(self):
+        return self.x
+    def __call__(self, x):
+        if self.count == 0:
+            self.x = x
+        else:
+            self.x = self.alpha * self.x + (1 - self.alpha) * x
+        self.count += 1
+        return self.x
+class LMA(object):
+    def __init__(self):
+        self.x = 0.
+        self.count = 0
+    @property
+    def value(self):
+        return self.x
+    def __call__(self, x):
+        if self.count == 0:
+            self.x = x
+        else:
+            self.x += (x - self.x)/(self.count+1)
+        self.count += 1
+        return self.x
+def detect_peaks_thread(i, pred, fname=None, result_dir=None, args=None):
+  if args is None:
+    itp, prob_p = detect_peaks(pred[i,:,0,1], mph=0.5, mpd=0.5/DataConfig().dt, show=False)
+    its, prob_s = detect_peaks(pred[i,:,0,2], mph=0.5, mpd=0.5/DataConfig().dt, show=False)
+  else:
+    itp, prob_p = detect_peaks(pred[i,:,0,1], mph=args.tp_prob, mpd=0.5/DataConfig().dt, show=False)
+    its, prob_s = detect_peaks(pred[i,:,0,2], mph=args.ts_prob, mpd=0.5/DataConfig().dt, show=False)
+  if (fname is not None) and (result_dir is not None):
+#    np.savez(os.path.join(result_dir, fname[i].decode().split('/')[-1]), pred=pred[i], itp=itp, its=its, prob_p=prob_p, prob_s=prob_s)
+    try:
+      np.savez(os.path.join(result_dir, fname[i].decode()), pred=pred[i], itp=itp, its=its, prob_p=prob_p, prob_s=prob_s)
+    except FileNotFoundError:
+      #if not os.path.exists(os.path.dirname(os.path.join(result_dir, fname[i].decode()))):
+      os.makedirs(os.path.dirname(os.path.join(result_dir, fname[i].decode())), exist_ok=True)
+      np.savez(os.path.join(result_dir, fname[i].decode()), pred=pred[i], itp=itp, its=its, prob_p=prob_p, prob_s=prob_s)
+  return [(itp, prob_p), (its, prob_s)]
+def plot_result_thread(i, pred, X, Y=None, itp=None, its=None,
+                       itp_pred=None, its_pred=None, fname=None, figure_dir=None):
+  dt = DataConfig().dt
+  t = np.arange(0, pred.shape[1]) * dt
+  box = dict(boxstyle='round', facecolor='white', alpha=1)
+  text_loc = [0.05, 0.77]
+  plt.figure(i)
+  plt.clf()
+  # fig_size = plt.gcf().get_size_inches()
+  # plt.gcf().set_size_inches(fig_size*[1, 1.2])
+  plt.subplot(411)
+  plt.plot(t, X[i, :, 0, 0], 'k', label='E', linewidth=0.5)
+  plt.autoscale(enable=True, axis='x', tight=True)
+  tmp_min = np.min(X[i, :, 0, 0])
+  tmp_max = np.max(X[i, :, 0, 0])
+  if (itp is not None) and (its is not None):
+    for j in range(len(itp[i])):
+      if j == 0:
+        plt.plot([itp[i][j]*dt, itp[i][j]*dt], [tmp_min, tmp_max], 'b', label='P', linewidth=0.5)
+      else:
+        plt.plot([itp[i][j]*dt, itp[i][j]*dt], [tmp_min, tmp_max], 'b', linewidth=0.5)
+    for j in range(len(its[i])):
+      if j == 0:
+        plt.plot([its[i][j]*dt, its[i][j]*dt], [tmp_min, tmp_max], 'r', label='S', linewidth=0.5)
+      else:
+        plt.plot([its[i][j]*dt, its[i][j]*dt], [tmp_min, tmp_max], 'r', linewidth=0.5)
+  plt.ylabel('Amplitude')
+  plt.legend(loc='upper right', fontsize='small')
+  plt.gca().set_xticklabels([])
+  plt.text(text_loc[0], text_loc[1], '(i)', horizontalalignment='center',
+           transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+  plt.subplot(412)
+  plt.plot(t, X[i, :, 0, 1], 'k', label='N', linewidth=0.5)
+  plt.autoscale(enable=True, axis='x', tight=True)
+  tmp_min = np.min(X[i, :, 0, 1])
+  tmp_max = np.max(X[i, :, 0, 1])
+  if (itp is not None) and (its is not None):
+    for j in range(len(itp[i])):
+      plt.plot([itp[i][j]*dt, itp[i][j]*dt], [tmp_min, tmp_max], 'b', linewidth=0.5)
+    for j in range(len(its[i])):
+      plt.plot([its[i][j]*dt, its[i][j]*dt], [tmp_min, tmp_max], 'r', linewidth=0.5)
+  plt.ylabel('Amplitude')
+  plt.legend(loc='upper right', fontsize='small')
+  plt.gca().set_xticklabels([])
+  plt.text(text_loc[0], text_loc[1], '(ii)', horizontalalignment='center',
+           transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+  plt.subplot(413)
+  plt.plot(t, X[i, :, 0, 2], 'k', label='Z', linewidth=0.5)
+  plt.autoscale(enable=True, axis='x', tight=True)
+  tmp_min = np.min(X[i, :, 0, 2])
+  tmp_max = np.max(X[i, :, 0, 2])
+  if (itp is not None) and (its is not None):
+    for j in range(len(itp[i])):
+      plt.plot([itp[i][j]*dt, itp[i][j]*dt], [tmp_min, tmp_max], 'b', linewidth=0.5)
+    for j in range(len(its[i])):
+      plt.plot([its[i][j]*dt, its[i][j]*dt], [tmp_min, tmp_max], 'r', linewidth=0.5)
+  plt.ylabel('Amplitude')
+  plt.legend(loc='upper right', fontsize='small')
+  plt.gca().set_xticklabels([])
+  plt.text(text_loc[0], text_loc[1], '(iii)', horizontalalignment='center',
+           transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+  plt.subplot(414)
+  if Y is not None:
+    plt.plot(t, Y[i, :, 0, 1], 'b', label='P', linewidth=0.5)
+    plt.plot(t, Y[i, :, 0, 2], 'r', label='S', linewidth=0.5)
+  plt.plot(t, pred[i, :, 0, 1], '--g', label='$\hat{P}$', linewidth=0.5)
+  plt.plot(t, pred[i, :, 0, 2], '-.m', label='$\hat{S}$', linewidth=0.5)
+  plt.autoscale(enable=True, axis='x', tight=True)
+  if (itp_pred is not None) and (its_pred is not None):
+    for j in range(len(itp_pred)):
+      plt.plot([itp_pred[j]*dt, itp_pred[j]*dt], [-0.1, 1.1], '--g', linewidth=0.5)
+    for j in range(len(its_pred)):
+      plt.plot([its_pred[j]*dt, its_pred[j]*dt], [-0.1, 1.1], '-.m', linewidth=0.5)
+  plt.ylim([-0.05, 1.05])
+  plt.text(text_loc[0], text_loc[1], '(iv)', horizontalalignment='center',
+           transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+  plt.legend(loc='upper right', fontsize='small')
+  plt.xlabel('Time (s)')
+  plt.ylabel('Probability')
+  plt.tight_layout()
+  plt.gcf().align_labels()
+  try:
+    plt.savefig(os.path.join(figure_dir,
+                fname[i].decode().rstrip('.npz')+'.png'),
+                bbox_inches='tight')
+  except FileNotFoundError:
+  #if not os.path.exists(os.path.dirname(os.path.join(figure_dir, fname[i].decode()))):
+    os.makedirs(os.path.dirname(os.path.join(figure_dir, fname[i].decode())), exist_ok=True)
+    plt.savefig(os.path.join(figure_dir,
+                fname[i].decode().rstrip('.npz')+'.png'),
+                bbox_inches='tight')
+  #plt.savefig(os.path.join(figure_dir,
+  #            fname[i].decode().split('/')[-1].rstrip('.npz')+'.png'),
+  #            bbox_inches='tight')
+  # plt.savefig(os.path.join(figure_dir,
+  #             fname[i].decode().split('/')[-1].rstrip('.npz')+'.pdf'),
+  #             bbox_inches='tight')
+  plt.close(i)
+  return 0
+def postprocessing_thread(i, pred, X, Y=None, itp=None, its=None, fname=None, result_dir=None, figure_dir=None, args=None):
+  (itp_pred, prob_p), (its_pred, prob_s) = detect_peaks_thread(i, pred, fname, result_dir, args)
+  if (fname is not None) and (figure_dir is not None):
+    plot_result_thread(i, pred, X, Y, itp, its, itp_pred, its_pred, fname, figure_dir)
+  return [(itp_pred, prob_p), (its_pred, prob_s)]
+def clean_queue(picks):
+  clean = []
+  for i in range(len(picks)):
+    tmp = []
+    for j in picks[i]:
+      if j != 0:
+        tmp.append(j)
+    clean.append(tmp)
+  return clean
+def clean_queue_thread(picks):
+  tmp = []
+  for j in picks:
+    if j != 0:
+      tmp.append(j)
+  return tmp
+def metrics(TP, nP, nT):
+  '''
+  TP: true positive
+  nP: number of positive picks
+  nT: number of true picks
+  '''
+  precision = TP / nP
+  recall = TP / nT
+  F1 = 2* precision * recall / (precision + recall)
+  return [precision, recall, F1]
+def correct_picks(picks, true_p, true_s, tol):
+  dt = DataConfig().dt
+  if len(true_p) != len(true_s):
+    print("The length of true P and S pickers are not the same")
+  num = len(true_p)
+  TP_p = 0; TP_s = 0; nP_p = 0; nP_s = 0; nT_p = 0; nT_s = 0
+  diff_p = []; diff_s = []
+  for i in range(num):
+    nT_p += len(true_p[i])
+    nT_s += len(true_s[i])
+    nP_p += len(picks[i][0][0])
+    nP_s += len(picks[i][1][0])
+    if len(true_p[i]) > 1 or len(true_s[i]) > 1:
+      print(i, picks[i], true_p[i], true_s[i])
+    tmp_p = np.array(picks[i][0][0]) - np.array(true_p[i])[:,np.newaxis]
+    tmp_s = np.array(picks[i][1][0]) - np.array(true_s[i])[:,np.newaxis]
+    TP_p += np.sum(np.abs(tmp_p) < tol/dt)
+    TP_s += np.sum(np.abs(tmp_s) < tol/dt)
+    diff_p.append(tmp_p[np.abs(tmp_p) < 0.5/dt])
+    diff_s.append(tmp_s[np.abs(tmp_s) < 0.5/dt])
+  return [TP_p, TP_s, nP_p, nP_s, nT_p, nT_s, diff_p, diff_s]
+def calculate_metrics(picks, itp, its, tol=0.1):
+  TP_p, TP_s, nP_p, nP_s,  nT_p, nT_s, diff_p, diff_s = correct_picks(picks, itp, its, tol)
+  precision_p, recall_p, f1_p = metrics(TP_p, nP_p, nT_p)
+  precision_s, recall_s, f1_s = metrics(TP_s, nP_s, nT_s)
+  logging.info("Total records: {}".format(len(picks)))
+  logging.info("P-phase:")
+  logging.info("True={}, Predict={}, TruePositive={}".format(nT_p, nP_p, TP_p))
+  logging.info("Precision={:.3f}, Recall={:.3f}, F1={:.3f}".format(precision_p, recall_p, f1_p))
+  logging.info("S-phase:")
+  logging.info("True={}, Predict={}, TruePositive={}".format(nT_s, nP_s, TP_s))
+  logging.info("Precision={:.3f}, Recall={:.3f}, F1={:.3f}".format(precision_s, recall_s, f1_s))
+  return [precision_p, recall_p, f1_p], [precision_s, recall_s, f1_s]

phasenet/visulization.py ADDED Viewed

	@@ -0,0 +1,481 @@

+import matplotlib
+matplotlib.use("agg")
+import matplotlib.pyplot as plt
+import numpy as np
+import os
+def plot_residual(diff_p, diff_s, diff_ps, tol, dt):
+    box = dict(boxstyle='round', facecolor='white', alpha=1)
+    text_loc = [0.07, 0.95]
+    plt.figure(figsize=(8,3))
+    plt.subplot(1,3,1)
+    plt.hist(diff_p, range=(-tol, tol), bins=int(2*tol/dt)+1, facecolor='b', edgecolor='black', linewidth=1)
+    plt.ylabel("Number of picks")
+    plt.xlabel("Residual (s)")
+    plt.text(text_loc[0], text_loc[1], "(i)", horizontalalignment='left', verticalalignment='top',
+            transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+    plt.title("P-phase")
+    plt.subplot(1,3,2)
+    plt.hist(diff_s, range=(-tol, tol), bins=int(2*tol/dt)+1, facecolor='b', edgecolor='black', linewidth=1)
+    plt.xlabel("Residual (s)")
+    plt.text(text_loc[0], text_loc[1], "(ii)", horizontalalignment='left', verticalalignment='top',
+            transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+    plt.title("S-phase")
+    plt.subplot(1,3,3)
+    plt.hist(diff_ps, range=(-tol, tol), bins=int(2*tol/dt)+1, facecolor='b', edgecolor='black', linewidth=1)
+    plt.xlabel("Residual (s)")
+    plt.text(text_loc[0], text_loc[1], "(iii)", horizontalalignment='left', verticalalignment='top',
+            transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+    plt.title("PS-phase")
+    plt.tight_layout()
+    plt.savefig("residuals.png", dpi=300)
+    plt.savefig("residuals.pdf")
+# def plot_waveform(config, data, pred, label=None,
+#                   itp=None, its=None, itps=None,
+#                   itp_pred=None, its_pred=None, itps_pred=None,
+#                   fname=None, figure_dir="./", epoch=0, max_fig=10):
+#     dt = config.dt if hasattr(config, "dt") else 1.0
+#     t = np.arange(0, pred.shape[1]) * dt
+#     box = dict(boxstyle='round', facecolor='white', alpha=1)
+#     text_loc = [0.05, 0.77]
+#     if fname is None:
+#         fname = [f"{epoch:03d}_{i:02d}" for i in range(len(data))]
+#     else:
+#         fname = [fname[i].decode().rstrip(".npz") for i in range(len(fname))]
+#     for i in range(min(len(data), max_fig)):
+#         plt.figure(i)
+#         plt.subplot(411)
+#         plt.plot(t, data[i, :, 0, 0], 'k', label='E', linewidth=0.5)
+#         plt.autoscale(enable=True, axis='x', tight=True)
+#         tmp_min = np.min(data[i, :, 0, 0])
+#         tmp_max = np.max(data[i, :, 0, 0])
+#         if (itp is not None) and (its is not None):
+#             for j in range(len(itp[i])):
+#                 lb = "P" if j==0 else ""
+#                 plt.plot([itp[i][j]*dt, itp[i][j]*dt], [tmp_min, tmp_max], 'C0', label=lb, linewidth=0.5)
+#             for j in range(len(its[i])):
+#                 lb = "S" if j==0 else ""
+#                 plt.plot([its[i][j]*dt, its[i][j]*dt], [tmp_min, tmp_max], 'C1', label=lb, linewidth=0.5)
+#         if (itps is not None):
+#             for j in range(len(itps[i])):
+#                 lb = "PS" if j==0 else ""
+#                 plt.plot([itps[i][j]*dt, its[i][j]*dt], [tmp_min, tmp_max], 'C2', label=lb, linewidth=0.5)
+#         plt.ylabel('Amplitude')
+#         plt.legend(loc='upper right', fontsize='small')
+#         plt.gca().set_xticklabels([])
+#         plt.text(text_loc[0], text_loc[1], '(i)', horizontalalignment='center',
+#                  transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+#         plt.subplot(412)
+#         plt.plot(t, data[i, :, 0, 1], 'k', label='N', linewidth=0.5)
+#         plt.autoscale(enable=True, axis='x', tight=True)
+#         tmp_min = np.min(data[i, :, 0, 1])
+#         tmp_max = np.max(data[i, :, 0, 1])
+#         if (itp is not None) and (its is not None):
+#             for j in range(len(itp[i])):
+#                 lb = "P" if j==0 else ""
+#                 plt.plot([itp[i][j]*dt, itp[i][j]*dt], [tmp_min, tmp_max], 'C0', label=lb, linewidth=0.5)
+#             for j in range(len(its[i])):
+#                 lb = "S" if j==0 else ""
+#                 plt.plot([its[i][j]*dt, its[i][j]*dt], [tmp_min, tmp_max], 'C1', label=lb, linewidth=0.5)
+#         if (itps is not None):
+#             for j in range(len(itps[i])):
+#                 lb = "PS" if j==0 else ""
+#                 plt.plot([itps[i][j]*dt, itps[i][j]*dt], [tmp_min, tmp_max], 'C2', label=lb, linewidth=0.5)
+#         plt.ylabel('Amplitude')
+#         plt.legend(loc='upper right', fontsize='small')
+#         plt.gca().set_xticklabels([])
+#         plt.text(text_loc[0], text_loc[1], '(ii)', horizontalalignment='center',
+#                 transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+#         plt.subplot(413)
+#         plt.plot(t, data[i, :, 0, 2], 'k', label='Z', linewidth=0.5)
+#         plt.autoscale(enable=True, axis='x', tight=True)
+#         tmp_min = np.min(data[i, :, 0, 2])
+#         tmp_max = np.max(data[i, :, 0, 2])
+#         if (itp is not None) and (its is not None):
+#             for j in range(len(itp[i])):
+#                 lb = "P" if j==0 else ""
+#                 plt.plot([itp[i][j]*dt, itp[i][j]*dt], [tmp_min, tmp_max], 'C0', label=lb, linewidth=0.5)
+#             for j in range(len(its[i])):
+#                 lb = "S" if j==0 else ""
+#                 plt.plot([its[i][j]*dt, its[i][j]*dt], [tmp_min, tmp_max], 'C1', label=lb, linewidth=0.5)
+#         if (itps is not None):
+#             for j in range(len(itps[i])):
+#                 lb = "PS" if j==0 else ""
+#                 plt.plot([itps[i][j]*dt, itps[i][j]*dt], [tmp_min, tmp_max], 'C2', label=lb, linewidth=0.5)
+#         plt.ylabel('Amplitude')
+#         plt.legend(loc='upper right', fontsize='small')
+#         plt.gca().set_xticklabels([])
+#         plt.text(text_loc[0], text_loc[1], '(iii)', horizontalalignment='center',
+#                 transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+#         plt.subplot(414)
+#         if label is not None:
+#             plt.plot(t, label[i, :, 0, 1], 'C0', label='P', linewidth=1)
+#             plt.plot(t, label[i, :, 0, 2], 'C1', label='S', linewidth=1)
+#             if label.shape[-1] == 4:
+#                 plt.plot(t, label[i, :, 0, 3], 'C2', label='PS', linewidth=1)
+#         plt.plot(t, pred[i, :, 0, 1], '--C0', label='$\hat{P}$', linewidth=1)
+#         plt.plot(t, pred[i, :, 0, 2], '--C1', label='$\hat{S}$', linewidth=1)
+#         if pred.shape[-1] == 4:
+#             plt.plot(t, pred[i, :, 0, 3], '--C2', label='$\hat{PS}$', linewidth=1)
+#         plt.autoscale(enable=True, axis='x', tight=True)
+#         if (itp_pred is not None) and (its_pred is not None) :
+#             for j in range(len(itp_pred)):
+#                 plt.plot([itp_pred[j]*dt, itp_pred[j]*dt], [-0.1, 1.1], '--C0', linewidth=1)
+#             for j in range(len(its_pred)):
+#                 plt.plot([its_pred[j]*dt, its_pred[j]*dt], [-0.1, 1.1], '--C1', linewidth=1)
+#         if (itps_pred is not None):
+#             for j in range(len(itps_pred)):
+#                 plt.plot([itps_pred[j]*dt, itps_pred[j]*dt], [-0.1, 1.1], '--C2', linewidth=1)
+#         plt.ylim([-0.05, 1.05])
+#         plt.text(text_loc[0], text_loc[1], '(iv)', horizontalalignment='center',
+#                  transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+#         plt.legend(loc='upper right', fontsize='small', ncol=2)
+#         plt.xlabel('Time (s)')
+#         plt.ylabel('Probability')
+#         plt.tight_layout()
+#         plt.gcf().align_labels()
+#         try:
+#             plt.savefig(os.path.join(figure_dir, fname[i]+'.png'), bbox_inches='tight')
+#         except FileNotFoundError:
+#             os.makedirs(os.path.dirname(os.path.join(figure_dir, fname[i])), exist_ok=True)
+#             plt.savefig(os.path.join(figure_dir, fname[i]+'.png'), bbox_inches='tight')
+#         plt.close(i)
+#     return 0
+def plot_waveform(data, pred, fname, label=None,
+                  itp=None, its=None, itps=None,
+                  itp_pred=None, its_pred=None, itps_pred=None,
+                  figure_dir="./", dt=0.01):
+    t = np.arange(0, pred.shape[0]) * dt
+    box = dict(boxstyle='round', facecolor='white', alpha=1)
+    text_loc = [0.05, 0.77]
+    plt.figure()
+    plt.subplot(411)
+    plt.plot(t, data[:, 0, 0], 'k', label='E', linewidth=0.5)
+    plt.autoscale(enable=True, axis='x', tight=True)
+    tmp_min = np.min(data[:, 0, 0])
+    tmp_max = np.max(data[:, 0, 0])
+    if (itp is not None) and (its is not None):
+        for j in range(len(itp)):
+            lb = "P" if j==0 else ""
+            plt.plot([itp[j]*dt, itp[j]*dt], [tmp_min, tmp_max], 'C0', label=lb, linewidth=0.5)
+        for j in range(len(its[i])):
+            lb = "S" if j==0 else ""
+            plt.plot([its[j]*dt, its[j]*dt], [tmp_min, tmp_max], 'C1', label=lb, linewidth=0.5)
+    if (itps is not None):
+        for j in range(len(itps)):
+            lb = "PS" if j==0 else ""
+            plt.plot([itps[j]*dt, its[j]*dt], [tmp_min, tmp_max], 'C2', label=lb, linewidth=0.5)
+    plt.ylabel('Amplitude')
+    plt.legend(loc='upper right', fontsize='small')
+    plt.gca().set_xticklabels([])
+    plt.text(text_loc[0], text_loc[1], '(i)', horizontalalignment='center',
+                transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+    plt.subplot(412)
+    plt.plot(t, data[:, 0, 1], 'k', label='N', linewidth=0.5)
+    plt.autoscale(enable=True, axis='x', tight=True)
+    tmp_min = np.min(data[:, 0, 1])
+    tmp_max = np.max(data[:, 0, 1])
+    if (itp is not None) and (its is not None):
+        for j in range(len(itp)):
+            lb = "P" if j==0 else ""
+            plt.plot([itp[j]*dt, itp[j]*dt], [tmp_min, tmp_max], 'C0', label=lb, linewidth=0.5)
+        for j in range(len(its)):
+            lb = "S" if j==0 else ""
+            plt.plot([its[j]*dt, its[j]*dt], [tmp_min, tmp_max], 'C1', label=lb, linewidth=0.5)
+    if (itps is not None):
+        for j in range(len(itps)):
+            lb = "PS" if j==0 else ""
+            plt.plot([itps[j]*dt, itps[j]*dt], [tmp_min, tmp_max], 'C2', label=lb, linewidth=0.5)
+    plt.ylabel('Amplitude')
+    plt.legend(loc='upper right', fontsize='small')
+    plt.gca().set_xticklabels([])
+    plt.text(text_loc[0], text_loc[1], '(ii)', horizontalalignment='center',
+            transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+    plt.subplot(413)
+    plt.plot(t, data[:, 0, 2], 'k', label='Z', linewidth=0.5)
+    plt.autoscale(enable=True, axis='x', tight=True)
+    tmp_min = np.min(data[:, 0, 2])
+    tmp_max = np.max(data[:, 0, 2])
+    if (itp is not None) and (its is not None):
+        for j in range(len(itp)):
+            lb = "P" if j==0 else ""
+            plt.plot([itp[j]*dt, itp[j]*dt], [tmp_min, tmp_max], 'C0', label=lb, linewidth=0.5)
+        for j in range(len(its)):
+            lb = "S" if j==0 else ""
+            plt.plot([its[j]*dt, its[j]*dt], [tmp_min, tmp_max], 'C1', label=lb, linewidth=0.5)
+    if (itps is not None):
+        for j in range(len(itps)):
+            lb = "PS" if j==0 else ""
+            plt.plot([itps[j]*dt, itps[j]*dt], [tmp_min, tmp_max], 'C2', label=lb, linewidth=0.5)
+    plt.ylabel('Amplitude')
+    plt.legend(loc='upper right', fontsize='small')
+    plt.gca().set_xticklabels([])
+    plt.text(text_loc[0], text_loc[1], '(iii)', horizontalalignment='center',
+            transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+    plt.subplot(414)
+    if label is not None:
+        plt.plot(t, label[:, 0, 1], 'C0', label='P', linewidth=1)
+        plt.plot(t, label[:, 0, 2], 'C1', label='S', linewidth=1)
+        if label.shape[-1] == 4:
+            plt.plot(t, label[:, 0, 3], 'C2', label='PS', linewidth=1)
+    plt.plot(t, pred[:, 0, 1], '--C0', label='$\hat{P}$', linewidth=1)
+    plt.plot(t, pred[:, 0, 2], '--C1', label='$\hat{S}$', linewidth=1)
+    if pred.shape[-1] == 4:
+        plt.plot(t, pred[:, 0, 3], '--C2', label='$\hat{PS}$', linewidth=1)
+    plt.autoscale(enable=True, axis='x', tight=True)
+    if (itp_pred is not None) and (its_pred is not None) :
+        for j in range(len(itp_pred)):
+            plt.plot([itp_pred[j]*dt, itp_pred[j]*dt], [-0.1, 1.1], '--C0', linewidth=1)
+        for j in range(len(its_pred)):
+            plt.plot([its_pred[j]*dt, its_pred[j]*dt], [-0.1, 1.1], '--C1', linewidth=1)
+    if (itps_pred is not None):
+        for j in range(len(itps_pred)):
+            plt.plot([itps_pred[j]*dt, itps_pred[j]*dt], [-0.1, 1.1], '--C2', linewidth=1)
+    plt.ylim([-0.05, 1.05])
+    plt.text(text_loc[0], text_loc[1], '(iv)', horizontalalignment='center',
+                transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+    plt.legend(loc='upper right', fontsize='small', ncol=2)
+    plt.xlabel('Time (s)')
+    plt.ylabel('Probability')
+    plt.tight_layout()
+    plt.gcf().align_labels()
+    try:
+        plt.savefig(os.path.join(figure_dir, fname+'.png'), bbox_inches='tight')
+    except FileNotFoundError:
+        os.makedirs(os.path.dirname(os.path.join(figure_dir, fname)), exist_ok=True)
+        plt.savefig(os.path.join(figure_dir, fname+'.png'), bbox_inches='tight')
+    plt.close()
+    return 0
+def plot_array(config, data, pred, label=None,
+               itp=None, its=None, itps=None,
+               itp_pred=None, its_pred=None, itps_pred=None,
+               fname=None, figure_dir="./", epoch=0):
+    dt = config.dt if hasattr(config, "dt") else 1.0
+    t = np.arange(0, pred.shape[1]) * dt
+    box = dict(boxstyle='round', facecolor='white', alpha=1)
+    text_loc = [0.05, 0.95]
+    if fname is None:
+        fname = [f"{epoch:03d}_{i:03d}" for i in range(len(data))]
+    else:
+        fname = [fname[i].decode().rstrip(".npz") for i in range(len(fname))]
+    for i in range(len(data)):
+        plt.figure(i, figsize=(10, 5))
+        plt.clf()
+        plt.subplot(121)
+        for j in range(data.shape[-2]):
+            plt.plot(t, data[i, :, j, 0]/10 + j, 'k', label='E', linewidth=0.5)
+        plt.autoscale(enable=True, axis='x', tight=True)
+        tmp_min = np.min(data[i, :, 0, 0])
+        tmp_max = np.max(data[i, :, 0, 0])
+        plt.xlabel('Time (s)')
+        plt.ylabel('Amplitude')
+        # plt.legend(loc='upper right', fontsize='small')
+        # plt.gca().set_xticklabels([])
+        plt.text(text_loc[0], text_loc[1], '(i)', horizontalalignment='center', verticalalignment="top",
+                 transform=plt.gca().transAxes, fontsize="large", fontweight="normal", bbox=box)
+        plt.subplot(122)
+        for j in range(pred.shape[-2]):
+            if label is not None:
+                plt.plot(t, label[i, :, j, 1]+j, 'C2', label='P', linewidth=0.5)
+                plt.plot(t, label[i, :, j, 2]+j, 'C3', label='S', linewidth=0.5)
+                # plt.plot(t, label[i, :, j, 0]+j, 'C4', label='N', linewidth=0.5)
+            plt.plot(t, pred[i, :, j, 1]+j, 'C0', label='$\hat{P}$', linewidth=1)
+            plt.plot(t, pred[i, :, j, 2]+j, 'C1', label='$\hat{S}$', linewidth=1)
+            plt.autoscale(enable=True, axis='x', tight=True)
+        if (itp_pred is not None) and (its_pred is not None) and (itps_pred is not None):
+            for j in range(len(itp_pred)):
+                plt.plot([itp_pred[j]*dt, itp_pred[j]*dt], [-0.1, 1.1], '--C0', linewidth=1)
+            for j in range(len(its_pred)):
+                plt.plot([its_pred[j]*dt, its_pred[j]*dt], [-0.1, 1.1], '--C1', linewidth=1)
+            for j in range(len(itps_pred)):
+                plt.plot([itps_pred[j]*dt, itps_pred[j]*dt], [-0.1, 1.1], '--C2', linewidth=1)
+        # plt.ylim([-0.05, 1.05])
+        plt.text(text_loc[0], text_loc[1], '(ii)', horizontalalignment='center', verticalalignment="top",
+                 transform=plt.gca().transAxes, fontsize="large", fontweight="normal", bbox=box)
+        # plt.legend(loc='upper right', fontsize='small', ncol=2)
+        plt.xlabel('Time (s)')
+        plt.ylabel('Probability')
+        plt.tight_layout()
+        plt.gcf().align_labels()
+        try:
+            plt.savefig(os.path.join(figure_dir, fname[i]+'.png'), bbox_inches='tight')
+        except FileNotFoundError:
+            os.makedirs(os.path.dirname(os.path.join(figure_dir, fname[i])), exist_ok=True)
+            plt.savefig(os.path.join(figure_dir, fname[i]+'.png'), bbox_inches='tight')
+        plt.close(i)
+    return 0
+def plot_spectrogram(config, data, pred, label=None,
+                     itp=None, its=None, itps=None,
+                     itp_pred=None, its_pred=None, itps_pred=None,
+                     time=None, freq=None,
+                     fname=None, figure_dir="./", epoch=0):
+    # dt = config.dt
+    # df = config.df
+    # t = np.arange(0, data.shape[1]) * dt
+    # f = np.arange(0, data.shape[2]) * df
+    t, f = time, freq
+    dt = t[1] - t[0]
+    box = dict(boxstyle='round', facecolor='white', alpha=1)
+    text_loc = [0.05, 0.75]
+    if fname is None:
+        fname = [f"{i:03d}" for i in range(len(data))]
+    elif type(fname[0]) is bytes:
+        fname = [f.decode() for f in fname]
+    numbers = ["(i)", "(ii)", "(iii)", "(iv)"]
+    for i in range(len(data)):
+        fig = plt.figure(i)
+        # gs = fig.add_gridspec(4, 1)
+        for j in range(3):
+            # fig.add_subplot(gs[j, 0])
+            plt.subplot(4,1,j+1)
+            plt.pcolormesh(t, f, np.abs(data[i, :, :, j]+1j*data[i, :, :, j+3]).T, vmax=2*np.std(data[i, :, :, j]+1j*data[i, :, :, j+3]), cmap="jet", shading='auto')
+            plt.autoscale(enable=True, axis='x', tight=True)
+            plt.gca().set_xticklabels([])
+            if j == 1:
+                plt.ylabel('Frequency (Hz)')
+            plt.text(text_loc[0], text_loc[1], numbers[j], horizontalalignment='center',
+                    transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+        # fig.add_subplot(gs[-1, 0])
+        plt.subplot(4,1,4)
+        if label is not None:
+            plt.plot(t, label[i, :, 0, 1], '--C0', linewidth=1)
+            plt.plot(t, label[i, :, 0, 2], '--C3', linewidth=1)
+            plt.plot(t, label[i, :, 0, 3], '--C1', linewidth=1)
+        plt.plot(t, pred[i, :, 0, 1], 'C0', label='P', linewidth=1)
+        plt.plot(t, pred[i, :, 0, 2], 'C3', label='S', linewidth=1)
+        plt.plot(t, pred[i, :, 0, 3], 'C1', label='PS', linewidth=1)
+        plt.plot(t, t*0, 'k', linewidth=1)
+        plt.autoscale(enable=True, axis='x', tight=True)
+        if (itp_pred is not None) and (its_pred is not None) and (itps_pred is not None):
+            for j in range(len(itp_pred)):
+                plt.plot([itp_pred[j]*dt, itp_pred[j]*dt], [-0.1, 1.1], ':C3', linewidth=1)
+            for j in range(len(its_pred)):
+                plt.plot([its_pred[j]*dt, its_pred[j]*dt], [-0.1, 1.1], '-.C6', linewidth=1)
+            for j in range(len(itps_pred)):
+                plt.plot([itps_pred[j]*dt, itps_pred[j]*dt], [-0.1, 1.1], '--C8', linewidth=1)
+        plt.ylim([-0.05, 1.05])
+        plt.text(text_loc[0], text_loc[1], numbers[-1], horizontalalignment='center',
+                 transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+        plt.legend(loc='upper right', fontsize='small', ncol=1)
+        plt.xlabel('Time (s)')
+        plt.ylabel('Probability')
+        # plt.tight_layout()
+        plt.gcf().align_labels()
+        try:
+            plt.savefig(os.path.join(figure_dir, f'{epoch:02d}_'+fname[i]+'.png'), bbox_inches='tight')
+        except FileNotFoundError:
+            os.makedirs(os.path.dirname(os.path.join(figure_dir, fname[i])), exist_ok=True)
+            plt.savefig(os.path.join(figure_dir, f'{epoch:02d}_'+fname[i]+'.png'), bbox_inches='tight')
+        plt.close(i)
+    return 0
+def plot_spectrogram_waveform(config, spectrogram, waveform, pred, label=None,
+                              itp=None, its=None, itps=None, picks=None,
+                              time=None, freq=None,
+                              fname=None, figure_dir="./", epoch=0):
+    # dt = config.dt
+    # df = config.df
+    # t = np.arange(0, spectrogram.shape[1]) * dt
+    # f = np.arange(0, spectrogram.shape[2]) * df
+    t, f = time, freq
+    dt = t[1] - t[0]
+    box = dict(boxstyle='round', facecolor='white', alpha=1)
+    text_loc = [0.02, 0.90]
+    if fname is None:
+        fname = [f"{i:03d}" for i in range(len(spectrogram))]
+    elif type(fname[0]) is bytes:
+        fname = [f.decode() for f in fname]
+    numbers = ["(i)", "(ii)", "(iii)", "(iv)", "(v)", "(vi)", "(vii)"]
+    for i in range(len(spectrogram)):
+        fig = plt.figure(i, figsize=(6.4, 10))
+        # gs = fig.add_gridspec(4, 1)
+        for j in range(3):
+            # fig.add_subplot(gs[j, 0])
+            plt.subplot(7,1,j*2+1)
+            plt.plot(waveform[i,:,j], 'k', linewidth=0.5)
+            plt.autoscale(enable=True, axis='x', tight=True)
+            plt.gca().set_xticklabels([])
+            plt.ylabel('')
+            plt.text(text_loc[0], text_loc[1], numbers[j*2], horizontalalignment='left', verticalalignment='top',
+                    transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+        for j in range(3):
+            # fig.add_subplot(gs[j, 0])
+            plt.subplot(7,1,j*2+2)
+            plt.pcolormesh(t, f, np.abs(spectrogram[i, :, :, j]+1j*spectrogram[i, :, :, j+3]).T, vmax=2*np.std(spectrogram[i, :, :, j]+1j*spectrogram[i, :, :, j+3]), cmap="jet", shading='auto')
+            plt.autoscale(enable=True, axis='x', tight=True)
+            plt.gca().set_xticklabels([])
+            if j == 1:
+                plt.ylabel('Frequency (Hz) or Amplitude')
+            plt.text(text_loc[0], text_loc[1], numbers[j*2+1], horizontalalignment='left', verticalalignment='top',
+                    transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+        # fig.add_subplot(gs[-1, 0])
+        plt.subplot(7,1,7)
+        if label is not None:
+            plt.plot(t, label[i, :, 0, 1], '--C0', linewidth=1)
+            plt.plot(t, label[i, :, 0, 2], '--C3', linewidth=1)
+            plt.plot(t, label[i, :, 0, 3], '--C1', linewidth=1)
+        plt.plot(t, pred[i, :, 0, 1], 'C0', label='P', linewidth=1)
+        plt.plot(t, pred[i, :, 0, 2], 'C3', label='S', linewidth=1)
+        plt.plot(t, pred[i, :, 0, 3], 'C1', label='PS', linewidth=1)
+        plt.plot(t, t*0, 'k', linewidth=1)
+        plt.autoscale(enable=True, axis='x', tight=True)
+        plt.ylim([-0.05, 1.05])
+        plt.text(text_loc[0], text_loc[1], numbers[-1], horizontalalignment='left', verticalalignment='top',
+                 transform=plt.gca().transAxes, fontsize="small", fontweight="normal", bbox=box)
+        plt.legend(loc='upper right', fontsize='small', ncol=1)
+        plt.xlabel('Time (s)')
+        plt.ylabel('Probability')
+        # plt.tight_layout()
+        plt.gcf().align_labels()
+        try:
+            plt.savefig(os.path.join(figure_dir, f'{epoch:02d}_'+fname[i]+'.png'), bbox_inches='tight')
+        except FileNotFoundError:
+            os.makedirs(os.path.dirname(os.path.join(figure_dir, fname[i])), exist_ok=True)
+            plt.savefig(os.path.join(figure_dir, f'{epoch:02d}_'+fname[i]+'.png'), bbox_inches='tight')
+        plt.close(i)
+    return 0

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+tensorflow
+matplotlib
+pandas
+tqdm
+scipy
+obspy

setup.py ADDED Viewed

	@@ -0,0 +1,116 @@

+import io
+import os
+import re
+import sys
+from shutil import rmtree
+from typing import Tuple, List
+from setuptools import Command, find_packages, setup
+# Package meta-data.
+name = "PhaseNet"
+description = "PhaseNet"
+url = ""
+email = "wayne.weiqiang@gmail.com"
+author = "Weiqiang Zhu"
+requires_python = ">=3.6.0"
+current_dir = os.path.abspath(os.path.dirname(__file__))
+def get_version():
+    version_file = os.path.join(current_dir, "phasenet", "__init__.py")
+    with io.open(version_file, encoding="utf-8") as f:
+        return re.search(r'^__version__ = [\'"]([^\'"]*)[\'"]', f.read(), re.M).group(1)
+# What packages are required for this module to be executed?
+try:
+    with open(os.path.join(current_dir, "requirements.txt"), encoding="utf-8") as f:
+        required = f.read().split("\n")
+except FileNotFoundError:
+    required = []
+# What packages are optional?
+extras = {"test": ["pytest"]}
+version = get_version()
+about = {"__version__": version}
+def get_test_requirements():
+    requirements = ["pytest"]
+    if sys.version_info < (3, 3):
+        requirements.append("mock")
+    return requirements
+def get_long_description():
+    # base_dir = os.path.abspath(os.path.dirname(__file__))
+    # with io.open(os.path.join(base_dir, "README.md"), encoding="utf-8") as f:
+    #     return f.read()
+    return ""
+class UploadCommand(Command):
+    """Support setup.py upload."""
+    description = "Build and publish the package."
+    user_options: List[Tuple] = []
+    @staticmethod
+    def status(s):
+        """Print things in bold."""
+        print(s)
+    def initialize_options(self):
+        pass
+    def finalize_options(self):
+        pass
+    def run(self):
+        try:
+            self.status("Removing previous builds...")
+            rmtree(os.path.join(current_dir, "dist"))
+        except OSError:
+            pass
+        self.status("Building Source and Wheel (universal) distribution...")
+        os.system(f"{sys.executable} setup.py sdist bdist_wheel --universal")
+        self.status("Uploading the package to PyPI via Twine...")
+        os.system("twine upload dist/*")
+        self.status("Pushing git tags...")
+        os.system("git tag v{}".format(about["__version__"]))
+        os.system("git push --tags")
+        sys.exit()
+setup(
+    name=name,
+    version=version,
+    description=description,
+    long_description=get_long_description(),
+    long_description_content_type="text/markdown",
+    author="Weiqiang Zhu",
+    author_email = "wayne.weiqiang@gmail.com",
+    license="GPL-3.0",
+    url=url,
+    packages=find_packages(exclude=["tests", "docs", "dataset", "model", "log"]),
+    install_requires=required,
+    extras_require=extras,
+    classifiers=[
+        "License :: OSI Approved :: BSD License",
+        "Intended Audience :: Developers",
+        "Intended Audience :: Science/Research",
+        "Operating System :: OS Independent",
+        "Programming Language :: Python",
+        "Programming Language :: Python :: 3",
+        "Topic :: Software Development :: Libraries",
+        "Topic :: Software Development :: Libraries :: Python Modules",
+    ],
+    cmdclass={"upload": UploadCommand},
+)