Topography tool chain - description and upgrades for high-res

Topography tool chain - description and upgrades for high-res

Note: see @Walter Hannah 's original page on a remap based topo generation tool chain: https://acme-climate.atlassian.net/wiki/spaces/IPD/pages/4449009672

Motivation

  1. Upgrade workflow and datasets to support model resolutions finer than 3km

  2. Upgrade serial codes to parallel versions that can support sub-km resolutions.

  3. Separate out remap steps from SGH steps, to allow for more general SGH formulas

Summary of Recommendations

  1. Upgrade MOAB TempestRemap (MBTR) to include a “cell_avg” algorithm

  2. Upgrade VAR30 and VAR algorithms to use scalable cell_avg algorithm from MBTR

  3. Upgrade PHI algorithms to compute topography directly from USGS datasets.

  4. Implement all calculations in python, except those requiring specific algorithms in MBTR or homme_tool.

Notation

TERR_USGS Original ultra-high resolution terrain
TERR_3km TERR_USGS mapped to 3km ne3000pg1 grid
TERR_3km_t TERR_3km mapped to target grid
TERR_t TERR_USGS mapped to target grid
TERR_s dycore smoothing applied to TERR_t

SGH30 Variance between TERR_USGS and TERR_3km. Used by shoc, clubb, vertical_diffusion, TMS
SGH Variance between TERR_3km and TERR_s. Used by gw_tend

cell_avg(): a simple cell average downsampling (fine grid to coarse grid) remap algorithm - uniform average over all grid point data within cell, without computing a common refinement grid. In bin_to_cube.F90, the cells are given by the standard scrip file for the ne3000pg1 grid. In our proposed upgrade, we suggest for simplicity and speed that the cells be replaced by disks with the area associated with the target point.


aavg(): The standard cell-integrated 'aave' downsampling remap algorithm. Uses area weighted averages requiring a common refinement grid.

Datasets

  • usgs-rawdata.nc

    • Default dataset in current tool chain

    • 30s (1km DEM mesh): 43.2K x 21.6k. file: 7GB

    • Provides TERR_USGS

  • GMTED2010_7.5_stitch_S5P_OPER_REF_DEM_15_NCL_24-3.r172800x86400.nc

    • New ultra-high res data set created by @Jishi Zhang , see 800m cubed topo generation from GMTED2010 15s DEM

    • 7.5s (250m DEM mesh). 86.4K x 172.8K = 13.9GB (points) = 111GB memory after converting to real*8

    • Provides higher resolution version of TERR_USGS

  • USGS-topo-cube3000.nc

    • This file contains data (TERR_3km, SGH30_3km) on the 3km “ne3000pg1” grid used internally by the topo generation process.

Current tools

Output

The final output of the atmospheric topography tool chain is a topo file with the following variables:

  • PHIS_d = g*TERR_s (on the np4 spectral element GLL grid)

  • PHIS = PHIS_d mapped to pg2 (E3SM finite volume phys grid) via dycore specific algorithm (homme_tool)

  • SGH30 = sqrt(VAR30), on pg2 target grid

  • SGH = sqrt(VAR) on pg2 target grid

  • Some new data for the new gravity wave drag schemes on pg2 grid (need more information, cube_to_target)

Algorithms

TERR algorithms

The current algorithm first remaps TERR_USGS to the 3km grid using bin_to_cube.F90, then that result is mapped to the np4 target grid using cube_to_target.F90, and then finally dycore smoothing is applied via homme_tool. We write this algorithm as:

TERR_3km = cell_avg(TERR_USGS) TERR_3km_t = aave(TERR_3km) TERR_s = dycore_smoothing ( TERR_3km_t)

The problem with his algorithm is that it limits us to resolutions no finer than 3km. When using grids with finer resolution, the topography will be blocky, and this causes CFL/stability issues. One example with a 100m RRM grid is documented here. @Jishi Zhang was actually able to generate the topography on the model grid from a ne12000pg1 grid (800m resolution). But even this grid produces blocky artifacts in the 100m region in the RRM grid, due to its use of the ‘aave’ remap algorithm. To resolve these issues, we need source data with resolution close to 100m, or a better remap algorithm suitable for remapping from coarse to fine resolutions. In addition, the use of the “aave” algorithm is incredibly slow requiring weeks of time to compute with serial tools.

For best results, we should simply use:

TERR_t = cell_avg(TERR_USGS) TERR_s = dycore_smoothing ( TERR_USGS_t)

For this we need to update MBTR to support the cell_avg() algorithm. We cant use bin_to_cube.F90 since it only supports cubed sphere output grids and does not run in parallel. This map will always be from a lat/lon grid to a np4 GLL grid.

VAR30 algorithm

VAR30 represents the variance between the origional USGS grid and the 3km ne3000pg1 grid, and then this variance it mapped to our target grid. This assumes the target grid is coarser than 3km. VAR30 is used by parameterizations that want to trigger effects based on topographical variance at 3km and finder scales, independent of how coarse our model grid is.

For each cell on the 3km ne3000pg1 grid, we first compute VAR30_3km on the 3km ne3000pg1 grid by “bin_to_cube.F90” via:

VAR30_3km = cell_avg[ ( TERR_USGS - TERR_3km )^2 ]

The integral represented by cell_avg() is computed by the following: For each cell "j" on the 3km grid, let i=1..N represent all the grid points on the USGS grid contained in cell "j". VAR30_3km is computed as the average over i of (TERR_USGS(i)-TERR_3km(j))^2.

Since cell_avg() is a linear operator, we have

VAR30_3km = cell_avg(TERR_USGS^2) - 2 cell_avg(TERR_USGS*TERR_3km) + cell_avg(TERR_3km^2)

since TERR_3km is constant within the cell, and TERR_3km=cell_avg(TERR_USGS). The algorithm can the be computed directly on the ne3000pg1 grid in terms of remaped values of TERR and TERR^2:

TERR_SQ_3km = cell_avg(TERR_USGS^2) TERR_3km = cell_avg(TERR_USGS) VAR30_3km = TERR_SQ_3km - (TERR_3km)^2 # computed on 3km grid

Once VAR30_3km is computed on the 3km ne3000pg1 grid, it is remapped to our target physics grid. The physics grid is usually a FV pg2 grid, but can sometimes by a np4 GLL grid.

SGH30_t = aave(SGH30_3km), SGH30_3km = sqrt(VAR30_3km)

Recommendation: For a faster more scalable version of this chain, we propose to keep the same algorithm but replace all the cell_avg and aave remaps with a MBTR version of cell_avg.

VAR algorithm

The VAR field gives a measure of the variance between the 3km grid and our target grid. VAR is used by parameterizations that want to trigger effects based on topographical variance up to 3km in scale (but not finer).

It's currently computed as a cell average by the cube_to_target.F90 code. In cube_to_target.F90, the code is using an 'aave' map. This map is from the FV ne3000pg1 grid to our target physics grid (usually pg2, but somtimes np4).

TERR_SQ_3km_t = aave(TERR_3km^2) VAR = aavg[( TERR_3km - TERR_s )^2 ] = aave(TERR_3km^2) - 2*TERR_s aave(TERR_3km) + aave(TERR_s^2) = TERR_SQ_3km_t - 2*TERR_s*TERR_3km_t + TERR_s^2

Recommendation: We propose to upgrade the scalability of this algorithm with the MBTR cell_avg algorithm.

Issues for high resolution grids

Consider an RRM grid with a region with resolution finer than 3km, and a region with resolution coarser than 3km. The formula for VAR above is probably correct on such a grid - it will be 0 in regions finer than 3km, and reproduce the original behavior in regions coarser than 3km.

One possible approach: For VAR30, it sees like in the regions finer than 3km, we should be computing variance between TERR_USGS and TERR_t (lets call this VAR_t), while in the coarser regions use variance between TERR_USGS and TERR_3km. This can be achieved via:

TERR_SQ_t = cell_avg(TERR_USGS^2) VAR_t = TERR_SQ_t - 2*TERR_s*TERR_t + TERR_s^2 SGH_t = sqrt(VAR_t) SGH30 (RRM version) = min(SGH30,SGH_t)

 

Torture test for MBTR:

Need a downsampling (area averaging) and monotone map. Strict conservation not necessary, which is why we propose a simple neighbor averaging (all neighbors within average cell area). Method should represent a true call area average, not a inverse distance weighted interpolation.

The source grid for these two tests has 250m resolution (and much finer at the poles). The cell averages at the poles will probably be over hundreds of thousands of points. The “Altamount100m” grid has 100m resolution, so some of the cells wont actually contain any source grid points. I dont have an idea of what we should do in that case - nearest neighbor or inverse distance weighted would probably both be ok.

 

  1. Test 1: FVtoFV map: TERR and TERR^2 needed to compute SHG30 on ne3000pg1 grid.

    1. source grid: GMTED2010_7.5_stitch_S5P_OPER_REF_DEM_15_NCL_24-3.r172800x86400.nc (13.6G points, lat/lon grid)

    2. Target grid: ne3000pg1.scrip.nc (54M grid points)

  2. Test 2: FVtoSE map: TERR, TERR2

    1. source grid: GMTED2010_7.5_stitch_S5P_OPER_REF_DEM_15_NCL_24-3.r172800x86400.nc (13.6G points, lat/lon grid)

    2. Target grid: CAne32x128_Altamont100m_v2.g (np4 GLL grid) ( 300K elements, 3M grid points)

  3. Test 3: FVtoSE map: SGH30, TERR and TERR^2

    1. source grid: ne3000pg1.scrip.nc

    2. Target grid: CAne32x128_Altamont100m_v2pg2.scrip.nc (pg2 grid)

 

Files for the above tests on Perlmutter:

Data: (paths copied from: 800m cubed topo generation from GMTED2010 15s DEM )

  • 250m, 7.5s resolution, 86.4K x 172.8K: /global/cfs/cdirs/e3sm/zhang73/grids2/topo7.5s/GMTED2010_7.5_stitch_S5P_OPER_REF_DEM_15_NCL_24-3.r172800x86400.nc

  • lower resolution versions that might be good for testing:

Grids:

  • /global/cfs/cdirs/e3sm/taylorm/mapping/grids/TEMPEST_ne3000pg1.scrip.nc

  • /global/cfs/cdirs/e3sm/taylorm/mapping/grids/CAne32x128_Altamont100m_v2.g

  • We dont have grid discription files for the lat/lon source data. For this data, I think the only thing needed is the coordinates which is in the data file.

https://acme-climate.atlassian.net/wiki/spaces/IPD/pages/4449009672