Topography tool chain - description and upgrades for high-res
Note: see @Walter Hannah 's original page on a remap based topo generation tool chain: https://acme-climate.atlassian.net/wiki/spaces/IPD/pages/4449009672
Motivation
Upgrade workflow and datasets to support model resolutions finer than 3km
Upgrade serial codes to parallel versions that can support sub-km resolutions.
Separate out remap steps from SGH steps, to allow for more general SGH formulas
Summary of Recommendations
Upgrade MOAB TempestRemap (MBTR) to include a “cell_avg” algorithm
Upgrade VAR30 and VAR algorithms to use scalable cell_avg algorithm from MBTR
Upgrade PHI algorithms to compute topography directly from USGS datasets.
Implement all calculations in python, except those requiring specific algorithms in MBTR or homme_tool.
Notation
TERR_USGS Original ultra-high resolution terrain
TERR_3km TERR_USGS mapped to 3km ne3000pg1 grid
TERR_3km_t TERR_3km mapped to target grid
TERR_t TERR_USGS mapped to target grid
TERR_s dycore smoothing applied to TERR_t
SGH30 Variance between TERR_USGS and TERR_3km. Used by shoc, clubb, vertical_diffusion, TMS
SGH Variance between TERR_3km and TERR_s. Used by gw_tend
cell_avg(): a simple cell average downsampling (fine grid to coarse grid) remap algorithm - uniform average over all grid point data within cell, without computing a common refinement grid. In bin_to_cube.F90, the cells are given by the standard scrip file for the ne3000pg1 grid. In our proposed upgrade, we suggest for simplicity and speed that the cells be replaced by disks with the area associated with the target point.
aavg(): The standard cell-integrated 'aave' downsampling remap algorithm. Uses area weighted averages requiring a common refinement grid.
Datasets
Default dataset in current tool chain
30s (1km DEM mesh): 43.2K x 21.6k. file: 7GB
Provides TERR_USGS
GMTED2010_7.5_stitch_S5P_OPER_REF_DEM_15_NCL_24-3.r172800x86400.nc
New ultra-high res data set created by @Jishi Zhang , see 800m cubed topo generation from GMTED2010 15s DEM
7.5s (250m DEM mesh). 86.4K x 172.8K = 13.9GB (points) = 111GB memory after converting to real*8
Provides higher resolution version of TERR_USGS
USGS-topo-cube3000.nc
This file contains data (TERR_3km, SGH30_3km) on the 3km “ne3000pg1” grid used internally by the topo generation process.
Current tools
Mostly built using NCAR CESM tools described in: P.H. Lauritzen, J.T. Bacmeister, P.F. Callaghan, M. Taylor, NCAR_Topo (v1.0): NCAR global model topography generation software for unstructured grids, Geosci. Model Dev., 8, 3975-3986, 2015.
bin_to_cube.F90: used to compute TERR_3km and SGH_3km, using cell_avg(). bin_to_cube.F90 is limited to lat/lon source grids and cubed sphere target grids. The cell_avg() algorithm is simple and fast, but this serial utility currently requires over 512GB of memory to run (see 800m cubed topo generation from GMTED2010 15s DEM ) and is estimated to take weeks to run.
cube_to_target.F90: Used to compute TERR_t, SGH and SGH30. cube_to_target.F90 is limited to cubed sphere source grids and arbritrary target grids. It uses an aave algorithm, making it extremly slow for high resolution grids.
homme_tool: Used to apply dycore specific smoothing for the np4 topography, and the derived pg2 topography.
MBTR: MOAB TempestRemap: parallel remapping tool.
Recommendation: We propose to upgrade MBTR to use the cell_avg algorithm, including the ability to run this in parallel.
Output
The final output of the atmospheric topography tool chain is a topo file with the following variables:
PHIS_d = g*TERR_s (on the np4 spectral element GLL grid)
PHIS = PHIS_d mapped to pg2 (E3SM finite volume phys grid) via dycore specific algorithm (homme_tool)
SGH30 = sqrt(VAR30), on pg2 target grid
SGH = sqrt(VAR) on pg2 target grid
Some new data for the new gravity wave drag schemes on pg2 grid (need more information, cube_to_target)
Algorithms
TERR algorithms
The current algorithm first remaps TERR_USGS to the 3km grid using bin_to_cube.F90, then that result is mapped to the np4 target grid using cube_to_target.F90, and then finally dycore smoothing is applied via homme_tool. We write this algorithm as:
TERR_3km = cell_avg(TERR_USGS)
TERR_3km_t = aave(TERR_3km)
TERR_s = dycore_smoothing ( TERR_3km_t)
The problem with his algorithm is that it limits us to resolutions no finer than 3km. When using grids with finer resolution, the topography will be blocky, and this causes CFL/stability issues. One example with a 100m RRM grid is documented here. @Jishi Zhang was actually able to generate the topography on the model grid from a ne12000pg1 grid (800m resolution). But even this grid produces blocky artifacts in the 100m region in the RRM grid, due to its use of the ‘aave’ remap algorithm. To resolve these issues, we need source data with resolution close to 100m, or a better remap algorithm suitable for remapping from coarse to fine resolutions. In addition, the use of the “aave” algorithm is incredibly slow requiring weeks of time to compute with serial tools.
For best results, we should simply use:
TERR_t = cell_avg(TERR_USGS)
TERR_s = dycore_smoothing ( TERR_USGS_t)
For this we need to update MBTR to support the cell_avg() algorithm. We cant use bin_to_cube.F90 since it only supports cubed sphere output grids and does not run in parallel. This map will always be from a lat/lon grid to a np4 GLL grid.
VAR30 algorithm
VAR30 represents the variance between the origional USGS grid and the 3km ne3000pg1 grid, and then this variance it mapped to our target grid. This assumes the target grid is coarser than 3km. VAR30 is used by parameterizations that want to trigger effects based on topographical variance at 3km and finder scales, independent of how coarse our model grid is.
For each cell on the 3km ne3000pg1 grid, we first compute VAR30_3km on the 3km ne3000pg1 grid by “bin_to_cube.F90” via:
VAR30_3km = cell_avg[ ( TERR_USGS - TERR_3km )^2 ]
The integral represented by cell_avg() is computed by the following: For each cell "j" on the 3km grid, let i=1..N represent all the grid points on the USGS grid contained in cell "j". VAR30_3km is computed as the average over i of (TERR_USGS(i)-TERR_3km(j))^2.
Since cell_avg() is a linear operator, we have
VAR30_3km = cell_avg(TERR_USGS^2) - 2 cell_avg(TERR_USGS*TERR_3km) + cell_avg(TERR_3km^2)
since TERR_3km is constant within the cell, and TERR_3km=cell_avg(TERR_USGS). The algorithm can the be computed directly on the ne3000pg1 grid in terms of remaped values of TERR and TERR^2:
TERR_SQ_3km = cell_avg(TERR_USGS^2)
TERR_3km = cell_avg(TERR_USGS)
VAR30_3km = TERR_SQ_3km - (TERR_3km)^2 # computed on 3km grid
Once VAR30_3km is computed on the 3km ne3000pg1 grid, it is remapped to our target physics grid. The physics grid is usually a FV pg2 grid, but can sometimes by a np4 GLL grid.
SGH30_t = aave(SGH30_3km), SGH30_3km = sqrt(VAR30_3km)
Recommendation: For a faster more scalable version of this chain, we propose to keep the same algorithm but replace all the cell_avg and aave remaps with a MBTR version of cell_avg.
VAR algorithm
The VAR field gives a measure of the variance between the 3km grid and our target grid. VAR is used by parameterizations that want to trigger effects based on topographical variance up to 3km in scale (but not finer).
It's currently computed as a cell average by the cube_to_target.F90 code. In cube_to_target.F90, the code is using an 'aave' map. This map is from the FV ne3000pg1 grid to our target physics grid (usually pg2, but somtimes np4).
TERR_SQ_3km_t = aave(TERR_3km^2)
VAR = aavg[( TERR_3km - TERR_s )^2 ]
= aave(TERR_3km^2) - 2*TERR_s aave(TERR_3km) + aave(TERR_s^2)
= TERR_SQ_3km_t - 2*TERR_s*TERR_3km_t + TERR_s^2
Recommendation: We propose to upgrade the scalability of this algorithm with the MBTR cell_avg algorithm.
Issues for high resolution grids
Consider an RRM grid with a region with resolution finer than 3km, and a region with resolution coarser than 3km. The formula for VAR above is probably correct on such a grid - it will be 0 in regions finer than 3km, and reproduce the original behavior in regions coarser than 3km.
One possible approach: For VAR30, it sees like in the regions finer than 3km, we should be computing variance between TERR_USGS and TERR_t (lets call this VAR_t), while in the coarser regions use variance between TERR_USGS and TERR_3km. This can be achieved via:
TERR_SQ_t = cell_avg(TERR_USGS^2)
VAR_t = TERR_SQ_t - 2*TERR_s*TERR_t + TERR_s^2
SGH_t = sqrt(VAR_t)
SGH30 (RRM version) = min(SGH30,SGH_t)
Torture test for MBTR:
Need a downsampling (area averaging) and monotone map. Strict conservation not necessary, which is why we propose a simple neighbor averaging (all neighbors within average cell area). Method should represent a true call area average, not a inverse distance weighted interpolation.
The source grid for these two tests has 250m resolution (and much finer at the poles). The cell averages at the poles will probably be over hundreds of thousands of points. The “Altamount100m” grid has 100m resolution, so some of the cells wont actually contain any source grid points. I dont have an idea of what we should do in that case - nearest neighbor or inverse distance weighted would probably both be ok.
Test 1: FVtoFV map: TERR and TERR^2 needed to compute SHG30 on ne3000pg1 grid.
source grid: GMTED2010_7.5_stitch_S5P_OPER_REF_DEM_15_NCL_24-3.r172800x86400.nc (13.6G points, lat/lon grid)
Target grid: ne3000pg1.scrip.nc (54M grid points)
Test 2: FVtoSE map: TERR, TERR2
source grid: GMTED2010_7.5_stitch_S5P_OPER_REF_DEM_15_NCL_24-3.r172800x86400.nc (13.6G points, lat/lon grid)
Target grid: CAne32x128_Altamont100m_v2.g (np4 GLL grid) ( 300K elements, 3M grid points)
Test 3: FVtoSE map: SGH30, TERR and TERR^2
source grid: ne3000pg1.scrip.nc
Target grid: CAne32x128_Altamont100m_v2pg2.scrip.nc (pg2 grid)
Files for the above tests on Perlmutter:
Data: (paths copied from: 800m cubed topo generation from GMTED2010 15s DEM )
250m, 7.5s resolution, 86.4K x 172.8K: /global/cfs/cdirs/e3sm/zhang73/grids2/topo7.5s/GMTED2010_7.5_stitch_S5P_OPER_REF_DEM_15_NCL_24-3.r172800x86400.nc
lower resolution versions that might be good for testing:
43Kx86K: /global/cfs/cdirs/e3sm/zhang73/grids2/topo15s/S5P_OPER_REF_DEM_15_00000000T000000_99999999T999999_20160111T104226.nc
21.6K x 43.2K: /global/cfs/cdirs/e3sm/taylorm/topo/usgs-rawdata.nc (origional from NCAR: https://ftp.cgd.ucar.edu/cesm/inputdata/import/ )
Grids:
/global/cfs/cdirs/e3sm/taylorm/mapping/grids/TEMPEST_ne3000pg1.scrip.nc
/global/cfs/cdirs/e3sm/taylorm/mapping/grids/CAne32x128_Altamont100m_v2.g
We dont have grid discription files for the lat/lon source data. For this data, I think the only thing needed is the coordinates which is in the data file.
https://acme-climate.atlassian.net/wiki/spaces/IPD/pages/4449009672