This notebook reproduces a sample of the results found in the paper
M. Arcos and R. J. LeVeque, Validating Velocities in the GeoClaw Tsunami Model using Observations Near Hawaii from the 2011 Tohoku Tsunami, Pure and Applied Geophysics, 2015. http://dx.doi.org/10.1007/s00024-014-0980-y
In particular, this example reproduces the example in the directory rjleveque-tohoku2011-paper2-096e44c/Runs/HAI1123
that is archived in the Zenodo repository, although on a coarser grid by default. This run includes two gauges, one compared to ADCP observations of current speed, and one compared to surface elevation at a tide gauge.
To run this notebook, the following are necessary :
amr_level_max=6
. This run takes roughly 2 hours of CPU time. With amr_level_max=5
it takes about 15 minutes of CPU time.This notebook is run using Anaconda Python, version 3.7.4.
%pylab inline
import glob
import os
import pandas
from clawpack.clawutil.data import get_remote_file
from IPython.display import Image
This example is set up to attempt to replicate results obtained in Kahului Harbor. In the figure below, 1123 shows the location of the ADCP (acoustic Doppler current profiler) HAI1123 that was deployed at the time of the tsunami to study tidal currents. TG shows the location of Tidegauge 1615680 where surface elevation is monitored. In the GeoClaw simulation there are two synthetic gauges at these locations numbered 1123 and 5680.
Image('figures/Kahului_gauges.jpg', width=500) # from Figure 9 of paper
Observational data will be downloaded from http://doi.org/10.5281/zenodo.12185, the code/data repository for the paper cited above.
Note force=False
doesn't work as expected here because the zip file has a different name than the directory it creates, so this always downloads and unzips even if the directory already exists unless you comment it out...
obs_dir = './rjleveque-tohoku2011-paper2-096e44c'
if not os.path.isdir(obs_dir):
zipfile = 'tohoku2011-paper2-submitted_sept2014.zip'
data_url = 'https://zenodo.org/record/12185/files/%s?download=1' % zipfile
get_remote_file(data_url, output_dir='.', file_name=zipfile, force=False,
verbose=True, unpack=True)
print('Data from paper should be in %s' % obs_dir)
assert os.path.isdir(obs_dir), 'Directory not found'
The earthquake time is localized in UTC time coordinates. We convert to Pacific/Honolulu so that it can be subtracted from times provided in gauge files.
tquake = pandas.Timestamp('05:46:24 UTC on March 11, 2011',tz='Pacific/Honolulu')
The following routines are applied to rows of Pandas dataframes used to store current meter and tide gauge measurements.
# Add column for time since quake (entry type pandas.Timedelta)
def time_since_quake(row):
return row['date_time'] - tquake
# Add column for hours since quake (entry type : float64)
def hours_since_quake(row):
return row['time_since_quake'].value/1e9/3600.
# Use this to construct velocities from speed and directions.
def set_velocity(row):
s = row['Speed']
theta = row['Dir']
u = s * cos((90.-theta)*pi/180.)
v = s * sin((90.-theta)*pi/180.)
return [u,v]
# Compute final average
def average_vel(row,n):
return [row['u']/n, row['v']/n]
Read depth data from depth files in Observations\HAI*
directories and average the velocity data from these files to get single depth averaged u and v-velocities. These depth-averaged velocities, along with date/time data and time since earthquake will be stored as columns in a Pandas DataFrame. Data frames for each depth location will be stored as entries in a dictionary gauges_avg
. Directory keys are directory names containing depth files.
The depth-averaged velocities are then detided and results are stored in text files.
# Set columns to use for depth DataFrame
cols = ['DATE', 'TIME','Speed', 'Dir']
# Data provided from current meters is localized to Pacific/Honolulu time zone.
def date_time(row):
ts = row['TIME']
nt = row['DATE'].replace(hour=ts.hour,minute=ts.minute,second=ts.second)
return nt.tz_localize('Pacific/Honolulu')
# Create list of directories containing depth files
dlist = os.listdir(os.path.join(obs_dir, 'Observations'))
depth_dirs = []
for d in dlist:
p = os.path.join(obs_dir,'Observations',d)
if (os.path.isdir(p)):
depth_dirs.append(d)
# Store data frames for each location here
gauges_avg = {}
for gdir in depth_dirs:
print("Reading data in directory {:s}".format(gdir))
# Read in depth file names
depth_filenames = glob.glob(os.path.join(obs_dir,'Observations',gdir,\
'depth_*m.txt'))
# ---------------------------------------------------------------------
# Read first depth file to get time and date info. This assumes all depth
# files use the same time values (?)
f0 = depth_filenames[0]
df_dates = pandas.read_csv(f0,parse_dates=['DATE','TIME'], \
sep='\s+', \
names=cols,comment='#')
# Combine date and time columns to get correct localize date/time stamp
df_dates['date_time'] = df_dates.apply(date_time,axis=1)
# Subtract time of initial quake to get a 'Timedelta'.
df_dates['time_since_quake'] = df_dates.apply(time_since_quake,axis=1)
# Convert time delta to numerical value
df_dates['hours_since_quake'] = df_dates.apply(hours_since_quake,axis=1)
if gdir == "HAI1123_Kahului_harbor":
# Correction mentioned in detide.py
# if gaugeno==1123:
# t2 = t2 - 1. # correct error in NGDC data for this gauge
df_dates['hours_since_quake'] -= 1
# ---------------------------------------------------------------------
# Create new DataFrame using only date/time info created above
df_dates = df_dates.drop(['DATE','TIME','Speed','Dir'],axis=1)
gf_avg = pandas.DataFrame(df_dates)
# Add velocity columns and initialize velocities to zero.
gf_avg[['u','v']] = gf_avg.apply(lambda row : [0,0],axis=1,result_type='expand')
# ---------------------------------------------------------------------
# Read all files and accumulate velocity averages.
for f in depth_filenames:
# print('Reading {:s}'.format(f))
df = pandas.read_csv(f, parse_dates=['DATE','TIME'], \
sep='\s+',names=cols, \
comment='#')
# Compute velocity (u,v) from speed and direction stored in depth file
df[['u','v']] = df.apply(set_velocity,axis=1,result_type='expand')
# Accumulate velocities to be averaged.
gf_avg['u'] += df['u']
gf_avg['v'] += df['v']
# Average velocities
n = len(depth_filenames)
gf_avg[['u','v']] = gf_avg.apply(average_vel, axis=1, result_type='expand', args=(n,))
# Store data frame in a dictionary
gauges_avg[gdir] = gf_avg
Extract the data frame from dictionary, and display results of averaged velocities.
gauges_avg['HAI1123_Kahului_harbor']
This subroutine was taking from TG_DART_tools.py
in archive data.
def fit_tide_poly(t,eta,degree):
"""
Fit a polynomial of the specified degree to data
Returns the coefficents c of c[0] + c[1]*t + ...
and the polynomial fit eta_fit.
"""
from numpy.linalg import lstsq, svd
# from pylab import find
if numpy.any(numpy.isnan(eta)):
eta2 = numpy.where(numpy.isnan(eta), 1e50, eta)
# j_nonan = find(eta2 < 1e40)
# t_nonan = t[j_nonan]
# eta_nonan = eta[j_nonan]
t_nonan = t[eta2 < 1e40]
eta_nonan = eta[eta2 < 1e40]
print("Ignoring %i NaN values" % (len(eta)-len(eta_nonan)))
else:
t_nonan = t
eta_nonan = eta
# Scale data so matrix better conditioned:
scale_factor = abs(t_nonan).max()
t_nonan = t_nonan/scale_factor
t = t/scale_factor
# Use Newton polynomial basis using these points:
tpts = numpy.linspace(t_nonan.min(),t_nonan.max(),degree+1)
# Form A matrix for least squares fit
A = numpy.ones((len(t_nonan),degree+1))
for j in range(1,degree+1):
A[:,j] = A[:,j-1] * (t_nonan - tpts[j])
ncols = A.shape[1]
# Perform least squares fit:
c = lstsq(A,eta_nonan,rcond=None)[0]
if numpy.any(numpy.isnan(eta)):
# Reconstruct A using all times for calculating predicted:
A = numpy.ones((len(t),degree+1))
for j in range(1,degree+1):
A[:,j] = A[:,j-1] * (t - tpts[j])
eta_fit = numpy.dot(A,c)
return eta_fit
Loop over all data frames and detide average velocities.
for fname in gauges_avg.keys():
gf_avg = gauges_avg[fname]
thours = gf_avg['hours_since_quake']
u = gf_avg['u']
v = gf_avg['v']
degree = 15
u2 = fit_tide_poly(thours,u,degree)
v2 = fit_tide_poly(thours,v,degree)
# These values are essentially what is in "Observations/HAIXXXX/detided_poly.txt"
gf_avg['u_detided'] = gf_avg['u'] - u2
gf_avg['v_detided'] = gf_avg['v'] - v2
# Save detided velocities. These can be compared directly with gauge results from
# GeoClaw. Time is stored in hours, not seconds, so GeoClaw gauge times
# have to be scaled.
cols = ['hours_since_quake', 'u_detided', 'v_detided']
dname = '{:s}_detided.txt'.format(fname)
print("Writing detide file {:s}".format(dname))
gf_avg.to_csv(dname, columns = cols,
sep='\t', \
header=False, \
index=False)
The file detided_poly.txt
in Observations/HAI1123_Kahului_harbor
has detided data used in the Arcos and LeVeque paper. The first few lines in this file are :
%%bash
echo Archived:
head -5 rjleveque-tohoku2011-paper2-096e44c/Observations/HAI1123_Kahului_harbor/detided_poly.txt
The corresponding newly created values are:
gf = gauges_avg['HAI1123_Kahului_harbor'][['hours_since_quake','u_detided','v_detided']]
t = gf['hours_since_quake']
gf[0.82 < t][0:5]
You need to run the GeoClaw code in this directory before proceeding with the comparison of results. This can be done via these commands at the bash prompt:
make topo # downloads topo and dtopo files needed
make .output # compiles and runs the code
This uses the parameters specified in setrun.py
, in particular 6 levels of refinement, and with gauges at the locations 1123 and 5680 for comparison with observations.
Running the code requires about 15 minutes of CPU time (wall time will depend on how many cores you are using) and should produce output in the _output
directory.
geoclaw_outdir = '_output'
assert os.path.isdir(geoclaw_outdir), '*** Did you run GeoClaw?'
The plots below require the numeric gauges computed in the GeoClaw run.
figure(figsize=(10,8))
gf_avg = gauges_avg['HAI1123_Kahului_harbor']
thours = gf_avg['hours_since_quake']
subplot(211)
# ------------------------------------------
# Plot u-velocity data from observations
u = gf_avg['u_detided']
plot(thours,u,'k.-',markersize=5,label='Observation')
# ------------------------------------------
# Plot u-velocity gauge data from GeoClaw
f = os.path.join(geoclaw_outdir,'gauge01123.txt')
cols = ['level','t','q[1]','q[2]','q[3]','eta','aux']
df_gauge = df_dates = pandas.read_csv(f, sep='\s+', comment='#', names=cols)
tshift = 10*60 # Shift by 10 minutes (mentioned in tohoku paper)
tg = (df_gauge['t'] + tshift)/3600
hg = df_gauge['q[1]']
ug = 100*df_gauge['q[2]']/hg # Convert to cm/sec
vg = 100*df_gauge['q[3]']/hg # Convert to cm/sec
plot(tg,ug,'r.-',markersize=1, label='GeoClaw')
# ------------------------------------------
# Fix up axes, add axes labels, title, etc.
xlabel('Hours')
ylabel('u-velocity (cm/sec)')
grid(True)
title('u-velocity (depth averaged)')
yticks([-200, -100, 0, 100, 200])
ylim([-250,250])
xlim([7.25,13])
legend();
subplot(212)
# ------------------------------------------
# Plot v-velocity data from observations
gf_avg = gauges_avg['HAI1123_Kahului_harbor']
thours = gf_avg['hours_since_quake']
v = gf_avg['v_detided']
# Plot observavational data
plot(thours,v,'k.-',markersize=5,label='Observation')
# ------------------------------------------
# Plot v-velocity gauge from GeoClaw
plot(tg,vg,'r.-',markersize=1, label='GeoClaw')
# ------------------------------------------
# Fix axes limits, add title, axes labels etc
xlabel('Hours')
ylabel('v-velocity (cm/sec)')
grid(True)
title('v-velocity (depth averaged)')
yticks([-200, -100, 0, 100, 200])
ylim([-250,250])
xlim([7.5,13])
legend()
tight_layout()
These plots can be compared to Figure 10 in the paper (which was from a 6-level run). The new results with 6 levels give slightly larger peaks in the v-velocity than the original. Note that velocities are extremely sensitive to perturbations and changes in GeoClaw in the meantime may be responsible.
With only 5 levels, the computed velocities are smaller, but still over-estimate some of the observed peaks in $v$.
Image('figures/figure1123b.jpg', width=500) # from Figure 10 of paper
This plotting all the $(u,v)$ pairs for each observation allows one to see also the direction of flow more clearly.
figure(figsize=(6,6))
xlimits = (-300,300)
ylimits = (-300,300)
plot(xlimits, (0,0), 'k')
plot((0,0), ylimits, 'k')
plot(ug,vg,'r', label='geoclaw')
plot(u,v,'k.', label='Observed')
axis('scaled')
legend()
title('Velocities in u-v plane')
xlim(xlimits)
ylim(ylimits);
For comparison, the figure from the paper is shown below. Note that with 5 levels the computed velocity is much more nearly aligned in the N-S direction than what is seen with 6 levels.
Image('figures/figure1123a.jpg', width=500) # from Figure 10 of paper
In the following, tide gauge data files are read and data used to compare surface elevations with GeoClaw results.
Python scripts in archive refer to files TG_XXXX_raw.csv
in directory Observations/TideGauges
. However, the only files that were found are
1615680__2011-03-11_to_2011-03-13.csv
1617760__2011-03-11_to_2011-03-13.csv
These are loaded below, and surface height data is compared with results from GeoClaw
# Tide gauge data is localized in UTC coordinates and then converted.
def date_time_tides(row):
ts = row['TIME']
nt = row['DATE'].replace(hour=ts.hour,minute=ts.minute,second=ts.second)
return nt.tz_localize('UTC').tz_convert('Pacific/Honolulu')
tg_files = []
tg_files.append('1615680__2011-03-11_to_2011-03-13.csv')
tg_files.append('1617760__2011-03-11_to_2011-03-13.csv')
# Store tide gauge data in a dictionary.
tide_gauges = {}
for k in range(2):
csv_file = os.path.join(obs_dir,'TideGauges',tg_files[k])
print('Reading file {:s}'.format(csv_file.split('/')[-1]))
gf = pandas.read_csv(csv_file,parse_dates=['DATE','TIME'])
# ---------------------------------------------------------------------
# Add column with correct date and time (entry type : pandas.Timestamp)
gf['date_time'] = gf.apply(date_time_tides,axis=1)
# ---------------------------------------------------------------------
# Add column for time since quake (entry type pandas.Timedelta)
gf['time_since_quake'] = gf.apply(time_since_quake,axis=1)
# ---------------------------------------------------------------------
# Add column for hours since quake (entry type : float64)
gf['hours_since_quake'] = gf.apply(hours_since_quake,axis=1)
# Clean up values : Set '-' to NaN; convert numeric types to float
gf = gf.replace('-',nan)
cols = ['1MIN', '6MIN', 'ALTERNATE', 'RESIDUAL', 'PREDICTED']
convert = dict.fromkeys(cols, float)
gf = gf.astype(convert)
key = tg_files[k].split('__')[0]
tide_gauges[key] = gf
Display tide gauge data for a single tide gauge. We drop the DATE
and TIME
columns for display purposes.
tide_gauges['1615680'].drop(['DATE','TIME'],axis=1)
The harmonic fit routine belowo was taken from the TG_DART_tools.py module. The only minor change was to remove find
function, which is no longer available in pylab.
def fit_tide_harmonic(t, eta, periods, t0=0, svd_tol=0.01):
from numpy.linalg import lstsq, svd, norm
#from pylab import find
if numpy.any(numpy.isnan(eta)):
eta2 = numpy.where(numpy.isnan(eta), 1e50, eta)
# j_nonan = find(eta2 < 1e40)
# t_nonan = t[j_nonan]
# eta_nonan = eta[j_nonan]
t_nonan = t[eta2 < 1e40]
eta_nonan = eta[eta2 < 1e40]
else:
t_nonan = t
eta_nonan = eta
A = numpy.ones(t_nonan.shape)
names = periods.keys()
for k in names:
s1 = numpy.sin(2*numpy.pi*t_nonan/periods[k])
c1 = numpy.cos(2*numpy.pi*t_nonan/periods[k])
A = numpy.vstack([A,s1,c1])
A = A.T
ncols = A.shape[1]
# c,res,rank,s = lstsq(A,eta) ## Does not work well!
# Using full least squares solution gives very large coefficients
# Instead use SVD based pseudo-inverse throwing away singular
# values below the specified tolerance:
U,S,V = svd(A, full_matrices=False)
#print "Singular values: ",S
c = numpy.zeros(ncols)
num_sv = 0
for k in range(ncols):
if S[k]/S[0] > svd_tol:
c = c + (numpy.dot(U[:,k],eta_nonan) / S[k]) * V[k,:]
num_sv += 1
print("Inverting using %s singular values out of %s" % (num_sv, ncols))
c_sin = c[1::2]
c_cos = c[2::2]
c_cos = numpy.where(abs(c_cos)<1e-10, 1e-10, c_cos)
phi = -numpy.arctan(c_sin / c_cos) * 180./numpy.pi
phi = numpy.where(c_cos < 0., phi+180, phi)
# Determine offset, amplitude, phase so that fit has the form
# eta_fit = offset + sum_k amplitude[k] * cos(2*pi*(t - t0) + phase[k])
# where the sum is over all harmonic constituents in periods.keys()
offset = c[0] # constant term in fit
phase = {}
amplitude = {}
for i,name in enumerate(names):
amplitude[name] = numpy.sqrt(c_sin[i]**2 + c_cos[i]**2)
phase[name] = phi[i] + 360.*t0/periods[name]
#print "c: ",c
#print "c_cos: ",c_cos
#print "c_sin: ",c_sin
#print "+++ offset, amplitude, phase: ",offset, amplitude, phase
if numpy.any(numpy.isnan(eta)):
# Reconstruct A using all times for calculating predicted:
A = numpy.ones(t.shape)
for k in names:
s1 = numpy.sin(2*numpy.pi*t/periods[k])
c1 = numpy.cos(2*numpy.pi*t/periods[k])
A = numpy.vstack([A,s1,c1])
A = A.T
eta_fit = numpy.dot(A,c)
# residual = eta - eta_fit
# print "Norm of residual: ",norm(residual,2) # might return NaN
# print "Norm of amplitudes: ",norm(c[1:],2)
return eta_fit, amplitude, phase, offset
def get_periods():
"""
Returns dictionary of tidal harmonic constituent periods (in hours).
"""
periods = { \
'K1': 23.9344697,
'O1': 25.8193417,
'M2': 12.4206012,
'S2': 12.0000000,
'M3': 08.2804008,
'M4': 06.2103006,
'2MK5': 04.9308802,
'M6': 04.1402004,
'3MK7': 03.10515030,
'M8': 03.1051503,
'N2': 12.6583482,
'Q1': 26.8683567,
'MK3': 08.1771399,
'S4': 06.0000000,
'MN4': 06.2691739,
'NU2': 12.6260044,
'S6': 04.0000000,
'MU2': 12.8717576,
'2N2': 12.9053745,
'OO1': 22.3060742,
'LAM2': 12.2217742,
'S1': 24.0000000,
'M1': 24.8332484,
'J1': 23.0984768,
'MM': 661.3092049,
'SSA': 4382.9052087,
'SA': 8765.8210896,
'MSF': 354.3670522,
'MF': 327.8589689,
'RHO': 26.7230533,
'T2': 12.0164492,
'R2': 11.9835958,
'2Q1': 28.0062225,
'P1': 24.0658902,
'2SM2': 11.6069516,
'L2': 12.1916202,
'2MK3': 08.3863030,
'K2': 11.9672348,
'MS4': 06.1033393,
}
return periods
periods = get_periods()
constituents_hawaii = ['J1','K1','K2','M2','N2','O1','P1','Q1','S2','SA']
periods_hawaii = {k:periods[k] for k in constituents_hawaii}
p = periods_hawaii
for fname in tide_gauges.keys():
gf_detide = tide_gauges[fname]
thours = gf_detide['hours_since_quake']
eta = gf_detide['1MIN']
eta_fit, eta_offset, eta_amplitude, eta_phase= \
fit_tide_harmonic(thours, eta, periods=p, t0=0, svd_tol=1e-5)
# Store detided data in DataFrame
gf_detide['eta_detided'] = eta - eta_fit
# Save detided velocities. These can be compared directly with gauge results from
# GeoClaw. Time is stored in hours, not seconds, so GeoClaw gauge times
# have to be scaled.
cols = ['hours_since_quake', 'eta_detided']
dname = '{:s}_detided.txt'.format(fname)
print("Writing detide file {:s}".format(dname))
gf_detide.to_csv(dname, columns = cols,
sep='\t', \
header=False, \
index=False,\
na_rep = 'nan')
# savetxt(dname,gf_detide[cols].values,
Plot the surface height data from gauge 1614680 and compare to GeoClaw results.
figure(3)
clf()
gf = tide_gauges['1615680']
thours = gf['hours_since_quake']
# Plot sea level
plot(thours,0*thours,'k-',label='Sea Level', linewidth=0.5)
# Plot data from observations
plot(thours,gf['eta_detided'],'k.-',markersize=5,label='Observation')
# Plot gauge data from GeoClaw
f = os.path.join(geoclaw_outdir,'gauge05680.txt')
cols = ['level','t','q[1]','q[2]','q[3]','eta','aux']
df_gauge = pandas.read_csv(f, sep='\s+', comment='#', names=cols)
tshift = 10*60
tg = (df_gauge['t'] + tshift)/3600 # Shift by 10 minutes
etag = df_gauge['eta']
plot(tg,etag,'r.-',markersize=1,label='GeoClaw')
xlabel('Hours')
ylabel('Surface height')
title('Surface Height (1615680)')
#yticks(linspace(-2,2,9))
ylim(-3,3)
xlim([7.5,13])
grid(True)
legend();
For comparison, the corresponding plot from Figure 11 of the paper is shown below. Note that even with only 5 levels (refining to 10 arcseconds rather than 1/3 arcsecond) the surface elevation is well captures and does not change much when level 6 is added.
Image('figures/TG_1615680_compare.jpg', width=500) # from Figure 11 of paper