Creating SSP370 and SSP585 compsets for E3SM

Creating SSP370 and SSP585 compsets for E3SM

 

Metadata

Origin: May 2022
Documentation author(s): Jim Benedict (LANL)
E3SMv2 SSP* collaborators and contributors: Xingying Huang, @Hailong Wang, @Mingxuan Wu, @Alan Di Vittorio, @Gautam Bisht, @Qi Tang, @Wuyin Lin, @Chris Golaz, @Philip Cameron-Smith (Unlicensed), @Michael J Prather

 

Overview

Disclaimer

The information that follows is an outline of the steps that were taken to develop the SSP370 and SSP585 component sets ("compsets") for E3SM version 2 (released September 2021).  These same steps may or may not work for previous or future versions of E3SM.  The instructions are accurate to the best of the author's knowledge, but it must be noted that the author is not an expert in the E3SM land model or the E3SM aerosol and atmospheric chemistry modules. If errors are found in this document, please notify @Jim Benedict or add a comment to this page. A high-level overview of compset creation (from 2016, some information may be obsolete!) can be found in How to create a new compset; the information outlined below is intended to supplement this workflow overview by providing greater detail and is specific to the SSP370 and SSP585 compsets for E3SMv2.

Creating a local workspace

The Energy Exascale Earth System Model (E3SM) is freely available via a GitHub repository.  For production simulations, it is recommended to clone the latest "maintenance" version of E3SM to a local directory, following instructions here. One example of cloning the latest E3SM 2.0 maintenance version (where YYYYMMDD represents the current date):

git clone -b maint-2.0 git@github.com:E3SM-Project/E3SM.git E3SMv2_maint2.0_YYYYMMDD

Leveraging existing SSP585 compset to create SSP370 compset

Let's define $E3SMROOT = /path/to/E3SMv2_maint2.0_YYYYMMDD as the cloned local model workspace.  (Note to user:  the full list of commonly used directory paths such as $E3SMROOT can be found in the Definitions section near the end of this manual.)  Important information about which files must be modified to create the new SSP370 compset can be obtained by recursively searching for instances of 'ssp585' in $E3SMROOT:

cd $E3SMROOT grep -r -i ssp5 ./

This step provides a long list of files in which 'ssp5' appears.  Note:  It is strongly recommended to grep for the simple phrase 'ssp5' instead of anything more detailed, such as 'ssp585' or 'ssp5_85'.  Some of the returns may be irrelevant for the user's needs;  for example, there are many "BGC" versions of the SSP585 compset that could be ignored when creating the simple SSP370 compset.  The relevant entries returned from the grep search can guide the identification of files that must be modified for SSP370.  Below, in Creation of E3SM SSP370 input files, we focus on the key input files required for SSP370.  Later, in E3SM SSP370:  Configuration settings, we examine required modifications to the source code and namelist settings.

Creation of E3SM SSP370 input files

ELM land use files (SSP370)

ELM setup for SSP simulations primarily centers on the creation of two input files, called "fsurdat" and "fdyndat" in the ELM namelist, which represent the land cover properties (fsurdat = surfdata*.nc) and temporal land use changes (fdyndat = landuse.timeseries*.nc).  @Alan Di Vittorio has created a descriptive and highly informative summary of steps needed to complete the ELM setup.  Below are the key steps, with reference to Alan's page.  For all steps, do NOT have the E3SM unified software environment activated as this may interfere with the necessary environment setup.

  1. Check if the desired ELM input files might already exist in $INPUTDATA/lnd/clm2/surfdata_map.  Search for files of the form sspN_rcpN.N, for example ssp3_rcp7.0.

  2. If the desired ELM input files do not yet exist, certain tools/scripts exist in every E3SM download that can be used to create the two ELM input files.  One could use the copy of E3SM from within $E3SMROOT, but because these ELM tools generate large data files it is recommended instead that the user clone a second copy of the latest version of E3SM into a directory with sufficient disk space (a project directory is preferred, otherwise the scratch disk... avoid the home directory).  Example for E3SM master branch:

    git clone -b master --recursive https://github.com/E3SM-Project/E3SM.git E3SMv2-master-YYYYMMDD

    We will define $ELMTOOLSROOT = /path/to/E3SMv2_master_YYYYMMDD.

  3. Data files from input4mips repository must be preprocessed and reformatted before they can be used by ELM.  Some of the preprocessing is undertaken by ELM developers to convert the data from "LUH2" format to "LUH1" format.  For SSP370, the following files were posted to $INPUTDATA/lnd/clm2/rawdata/LUT_input_files_currentLUH2_SSP3_RCP70_LUH1f_c08292020.nc and LUH2_SSP3_RCP70_LUH1f_c08292020_harvest.nc.  The fields in these files are on a 0.5°-deg grid (LON=720, LAT=360) with slightly different time steps due to an offset for the harvest data.  Further processing of these files is needed.  Use the Land Use Translator (LUT) to convert LUH1-formatted data to ELM plant functional types (PFTs) and harvest fractions.  See here for supplemental details.  For SSP370, do the following:

    cd $E3SMROOT/components/elm/tools/clm4_5/land_use_translator module load cray-netcdf make ./land_use_translator LUH2_SSP3_RCP70_LUH1f_c08292020.nc $INPUTDATA/lnd/clm2/rawdata/LUT_input_files_current/

     

  4. A series of output files are written to ./output.  If there are no problems creating the LUT* output files, they may be copied to, e.g., $INPUTDATA/lnd/clm2/rawdata/LUT_LUH2_SSP3_RCP70_LUH1f_MMDDYYYY (note the change in date format).  Note that these LUT* files are on a 0.5° lat-lon grid, so there is no E3SM grid dependence yet.

  5. A "file list" text file must be created that lists full paths to the yearly LUT* files that ELM will use during the simulation.   For example, if the simulation will span years 2015-2100, this "file list" file must include separate lines pointing first to the LUT* 2015 file, then the LUT* 2016 files, and so on to the LUT* 2100 file.  A year is included within the same line as the full file path.  Use $INPUTDATA/lnd/clm2/rawdata/LUT_LUH2_SSP3_RCP70_LUH1f_02262021/LUT_LUH2_SSP3_RCP70_LUH1f_list.txt as a template.  Note:  The year stamp within the “filelist” file MUST be placed exactly at character 197.

  6. Use script mksurfdata.pl to generate the two ELM input files "fsurdat" and "fdyndat".  First, load required modules and configure the environment (below are the instructions for NERSC-Cori.. for Compy, see here):

    module load cray-netcdf module load cray-hdf5 export LIB_NETCDF=$NETCDF_DIR/lib export INC_NETCDF=$NETCDF_DIR/include export USER_FC=ifort export USER_CC=icc export USER_LDFLAGS="-L$NETCDF_DIR/lib -lnetcdf -lnetcdff -lnetcdf_intel" export USER_LDFLAGS=$USER_LDFLAGS" -L$HDF5_DIR/lib -lhdf5 -lhdf5_fortran -lhdf5_cpp -lhdf5_fortran_intel -lhdf5_hl_intel -lhdf5hl_fortran_intel"


    Then, compile the mksurfdata code:

    cd $E3SMROOT/components/elm/tools/clm4_5/mksurfdata_map/src/ gmake clean gmake


    Run mksurfdata.pl in "debug" model (-d) simply to create a namelist file that can be modified as needed.  For  SSP370 and the "ne30np4" horizontal grid on NERSC-Cori, this would be:

    ./mksurfdata.pl -res ne30np4.pg2 -years 2015 -rcp 3-7.0 -d -dinlc /global/cfs/cdirs/e3sm/inputdata -usr_mapdir /global/cfs/cdirs/e3sm/inputdata/lnd/clm2/mappingdata/maps/ne30np4pg2


    A description of mksurfdata.pl command options:
    -res ne30np4.pg2:  This will need to correspond exactly to a supported mapping file.  Let's use ne30np4pg2 as an example.  First, check that relevant mapping files exist in $INPUTDATA/lnd/clm2/mappingdata/maps/ne30np4pg2.  In this directory, the files map_0.5x0.5_*_to_ne30pg2*.nc are the relevant files and they do exist.  In $E3SMROOT/components/elm/bld/namelist_files/namelist_defaults.xml, a search of "map" and "to_hgrid" shows:

    <map frm_hgrid="0.5x0.5"    frm_lmask="AVHRR"  to_hgrid="ne30np4.pg2"   to_lmask="nomask">lnd/clm2/mappingdata/maps/ne30np4pg2/map_0.5x0.5_AVHRR_to_ne30pg2_nomask_aave_da_c201210.nc</map> <map frm_hgrid="0.5x0.5"    frm_lmask="MODIS"  to_hgrid="ne30np4.pg2"   to_lmask="nomask">lnd/clm2/mappingdata/maps/ne30np4pg2/map_0.5x0.5_MODIS_to_ne30pg2_nomask_aave_da_c201210.nc</map> <map frm_hgrid="0.5x0.5"    frm_lmask="nomask"  to_hgrid="ne30np4.pg2"   to_lmask="nomask">lnd/clm2/mappingdata/maps/ne30np4pg2/map_0.5x0.5_nomask_to_ne30pg2_nomask_aave_da_c201210.nc</map>

    What is entered for the -res option must match the desired to_hgrid setting;  therefore, for E3SMv2, we set -res ne30np4.pg2.

    -years 2015:  Recall that the two key ELM input files created in this exercise are fsurdat = surfdata*.nc and fdyndat = landuse.timeseries*.nc.  For standard configurations of E3SM/ELM, fsurdat should represent land cover properties for the first year of the simulation.  Therefore, for SSP* simulations that begin on 2015-01-01, -years 2015 should be used.


    -rcp 3-7.0:  This sets the RCP/SSP scenario.  For historical data omit the -rcp argument.

    -d:  Indicates "debug" mode, which produces only a pre-populated namelist file as noted above and does not produce any data files.

    -dinlc:  Should point to $INPUTDATA.

    -usr_mapdir:  Should point to the directory in which the relevant mapping files exist.


    This creates a pre-populated namelist file (based on the options supplied to mksurfdata.pl) called "namelist" in the current directory.  It is recommended to rename this file for better description and traceability, such as namelist_ssp370_YYYYMMDD.  Five key settings in the resulting namelist file should be checked and modified as needed:

    (1) mksrf_fvegtyp (input):  This should point to the LUT* land cover file corresponding to the first year of the simulation (in our SSP* example case, 2015, which is considered a "historical" year):  mksrf_fvegtyp  = '/global/cfs/cdirs/e3sm/inputdata/lnd/clm2/rawdata/LUT_LUH2_HIST_LUH1f_07082020/LUT_LUH2_historical_2015_c07082020.nc'
    (2) fsurdat (output):  The name of the created land cover file (a file containing a single year of land cover data in 12 monthly time steps) corresponding to the -years option and representing the first year of the simulation.  For the developed SSP370 compset for E3SMv2,  the entry for fsurdat (<fsurdat>lnd/clm2/surfdata_map/surfdata_ne30np4.pg2_simyr1850_2015_c211105.nc </fsurdat>) represents land surface conditions for 1850 instead of the preferred 2015 conditions.  This is a known oversight, but for reasons described here this error has negligible impact on standard SSP370 simulations run using the RUN_TYPE = 'hybrid'.
    (3) fsurlog (output):  A verbose logfile for fsurdat:  fsurlog        = 'surfdata_ne30np4.pg2_SSP3_RCP70_simyr2015_cYYMMDD.log'
    (4) mksrf_fdynuse (input):  This should point to the "filelist" file that lists LUT* files to be used in the E3SM simulation.  For a SSP* run, the listed files should span 2015-2100 inclusive:  mksrf_fdynuse  = '/global/cfs/cdirs/e3sm/inputdata/lnd/clm2/rawdata/LUT_LUH2_SSP3_RCP70_LUH1f_02262021/LUT_LUH2_SSP3_RCP70_LUH1f_list_JJB.txt'
    (5) fdyndat (output):  The name of the created land use file that will contain yearly data for each year of the E3SM simulation.  The yearly time span in the fdyndat filename should represent the span of years within mksrf_fdynusefdyndat        = 'landuse.timeseries_ne30np4.pg2_SSP3_RCP70_simyr2015-2100_cYYMMDD.nc

    Next, rerun ./mksurfdata.pl (not in debug mode) by supplying the newly created namelist to produce the output files:

    cd $E3SMROOT/components/elm/tools/clm4_5/mksurfdata_map ./mksurfdata_map < namelist_ssp370_YYYYMMDD


    The output files are saved to the current directory ($E3SMROOT/components/elm/tools/clm4_5/mksurfdata_map).  If they will be used in E3SM production runs, and if the user is part of the e3sm UNIX group, the surfdata* and landuse* files may be copied to $INPUTDATA/lnd/clm2/surfdata_map for wider use.  See E3SM SSP370:  Configuration settings for recommended namelist settings.

Further reading:  Additional details on creating the landuse* and surfdata* files on a new grid, see this site for more complete instructions.

EAM aerosol emissions files (SSP370)

A series of EAM input files representing the scenario-dependent time evolution of aerosol emissions are required.  Importantly, while the aerosol emissions files are scenario-depedent, they are independent of the spatial grid (they are interpolated to whatever spatial grid is used for the E3SM simulation).  At least for SSP* simulations in the E3SMv2 framework, the emissions of the various aerosol species are prescribed ("specified").  The aerosol files are grouped into two categories, denoted by "specifiers" in the EAM namelist, and their namelist entries for SSP370 (which uses MAM4) are listed below:

  1. ext_frc_specifier (elevated emissions, aerosol production away from the surface):  $INPUTDATA/atm/cam/chem/trop_mozart_aero/emis/CMIP6_SSP370_ne30/cmip6_ssp370_mam4_[species]_elev_2015-2100_c210216.nc, where [species] is so2, soag, bc_a4, num_a1, num_a2, num_a4, pom_a4, so4_a1, and so4_a2.

  2. srf_emis_specifier (surface emissions):  $INPUTDATA/atm/cam/chem/trop_mozart_aero/emis/CMIP6_SSP370_ne30/cmip6_ssp370_mam4_[species]_surf_2015-2100_c210216.nc, where [species] is so2, bc_a4, num_a1, num_a2, num_a4, pom_a4, so4_a1, and so4_a2.  NOTE:  Another aerosol emissions file for DMS ($INPUTDATA/atm/cam/chem/trop_mozart_aero/emis/DMSflux.1850-2100.1deg_latlon_conserv.POPmonthlyClimFromACES4BGC_c20160727.nc) is included as part of the srf_emis_specifier entry, but it is scenario independent and is found in a separate directory.

For the E3SMv2 SSP370 compset, the aerosol input files were created and provided by @Hailong Wang.  However, Xingying Huang and @Jim Benedict independently reproduced as much as the aerosol input fields as possible (some inputs are simulation-derived and could not easily be reproduced) to learn and validate the process.  Instructions that identify required aerosol files and a template script to process them were provided by Hailong:

E3SM_aerosol_emissions.pdf: Instruction manual provided by @Hailong Wang that identifies input4mips raw data files needed to create E3SM aerosol emissions input files.
Create_emission_f19_MAM4_HIST_ORIG.m: Original MATLAB script provided by @Hailong Wang that processes historical input4mips emissions files only.

A modified version of Create_emission_f19_MAM4_HIST_ORIG.m specific to SSP370 is Create_emission_f19_MAM4_SSP370.m:

The general process is outlined below, with supplemental details:

  1. Download required input4mips aerosol data files.  The list of required files is included in Hailong's instruction manual.  An example download session is:

    1. Go to:  https://esgf-node.llnl.gov/search/input4mips/

    2. Enter in search box:  OC-em-anthro-openburning AND ScenarioMIP AND ssp370

    3. Results show four data sets ("data sets" are collections of files).  Click on "List files" to see which files are contained in each data set.

    4. For this search, I found file OC-em-anthro_input4MIPs_emissions_ScenarioMIP_IAMC-AIM-ssp370-1-1_gn_201501-210012.nc in the fourth data set.

    5. Click "Add to Data Cart"

    6. Once all data sets are added to cart, click on "My Data Cart" (top right)

    7. Click button for "Select All Datasets", then ABOVE this, near the top, click on "WGET script" following "Collective Services for All Selected Datasets".  DO NOT click on "WGET script" for each individual dataset.

    8. Save the wget script locally, scp it to the preferred supercomputing facility.

    9. Run wget script. An example for NERSC-Cori:

      cd /path/to/wget_script_directory wget-20210517125351.sh -s

       

  2. The downloaded input4mips files were interpolated to CESM's fv19 (nominal 1.9°x2.5° finite volume) grid, although this is not a strict requirement.  Remapping to a coarser grid expedites processing, and the fv19 grid has traditionally been used.  Recall that the aerosol files are interpolated to the model grid automatically during runtime.  Hailong's instructions include steps to do the remapping using NCL, and an alternate approach using ncremap via a bash script is provided here.

  3. A modified version of Hailong's MATLAB script (Create_emission_f19_MAM4_SSP370.m) was created to convert the remapped CMIP6 (input4mips) aerosol emissions data files to the required E3SM input file format for SSP370.  A description of what the MATLAB script does is included in Hailong's instruction manual, but essentially certain assumptions are made regarding aerosol vertical distribution and species categorization.

    1. In the script, modify inputDataRoot and outputDataRoot to point to locations where input exist and where output files should be written.

    2. Confirm that year, day, and year2 are appropriate for the desired simulation.  Note that year2 should have "buffer" years, offset by 1 year, before and after year.

    3. Hailong's original script, designed to process historical aerosol emissions files, was primarily modified by changing the instances of infile.

    4. Additional notes before running the MATLAB script:

      1. The raw input4mips emissions files represent monthly seasonal cycles for 2015, 2020, 2030, …, 2100, as represented by the year and date arrays.  Among other changes, the MATLAB script adds "buffer" years to the beginning and end of the time series by copying the 2015 seasonal cycle to "2014" and by copying the 2100 seasonal cycle to "2101", as represented by the year2 and date2 arrays.

      2. The script systematically processes the following species:  BC, BC_ELEV, POM, POM_ELEV, SO2, SO2_ELEV, SO4_a1, SO4_a1_ELEV, SO4_a2, SO4_a2_ELEV, NUM_a1 & NUM_a4 (includes num_a1_BC_*, num_a1_POM_*, num_a1_SO4_*;  some "a1" variables are actually written to "a4" file), NUM_a1_ELEV & NUM_a4_ELEV (see previous comment), NUM_a2, NUM_a2_ELEV, BIGALK, BIGENE, ISOPRENE, TERPENE, TOLUENE.

      3. Beginning with "BIGALK" in the species list above, there are files of the form folder4='/Volumes/disk3/CEDS/regrid/VOC04_anthro_185001-185012.nc for which future-scenario analogs were not available.  Hailong's note on this:
        This section of the MATLAB script is to produce the SOAG emissions following the old AR5 way for CESM, in which the VOC emissions were based on the combination of some NMVOC datasets and an atmospheric chemistry model output if I remember it correctly. (I will confirm it with Yang Yang.)  For the E3SMv1 historical simulations, we decided to do something different for SOAG,  as I described in the emission document and also in the Wang et al. (2020) aerosol overview paper for more details. However, the rescaling of SOAG from OC emissions requires a separate simulation that has an explicit SOA treatment, which was based on the CESM1 model and is being implemented in the E3SMv3 as part of the NGD task. We did have such a CESM1 simulation for the historical SOAG emissions. For the SSP585 and SSP370 emissions that Yang Yang helped produce, we used the same historical CESM1 simulation for the rescaling, which is not the best practice. It won’t be used for future versions of the E3SM model, so I didn’t include the rescaling procedure and simulation output. For your SSP370 simulations, you may choose to use the SOAG we generated. To generate it from the MATLAB script (as for the historical ones), you would need to find ways to obtain the required VOC species that are not provided by the input4MIPS. I do expect that SOAG emissions are treated differently in all CMIP6 models.
        Therefore, the SOAG file Hailong provided (cmip6_ssp370_mam4_soag_elev_2015-2100_c210216.nc) cannot be easily verified independently.  Instead, effort was made instead to verify that the total mass emissions for the various species matches those from past versions of the files.

  4. Validation of aerosol emissions files

    1. Various methods to compare the aerosol files provided by Hailong with those created "in-house" show that differences are generally within machine precision over all global areas except immediately along 0° longitude, where differences were larger but not substantial.  A test using esmf_regrid via NCL (instead of ncremap) shows that the 0° longitude differences reduce to within machine precision, suggesting that these differences are entirely due to the selected remapping scheme.  Using NCL's esmf_regrid produces results that are nearly identical to Hailong's data files.  Using ncremap produces generally very similar values, with the largest differences along longitude 0°E.  For E3SMv2 SSP370 aerosol files, the version created from esmf_regrid (i.e., those files provided by Hailong) was used.

    2. Simple comparison between input4mips files and final aerosol emissions files provided by Hailong:  Surface emissions were compared using this script, which converts the E3SM input file aerosol data back to the original input4mips format and computes global sums of the available species.  For all surface emissions (could not check SOAG as noted earlier), total global summed mass fluxes were within ~0.02% for all sectors, where sectors refers to the different types of emission sources including agriculture (AGR), energy (ENE), industry (IND), international shipping (SHP), residential and commercial (RCO), solvent production and application (SLV), transportation (TRA), and waste (WST) (see Gidden et al. 2019).

    3. A more detailed validation of surface aerosol emissions data between the E3SM aerosol input files and the input4mips files was also undertaken.  There is no straightforward way to evaluate elevated aerosol emissions since the original mass fluxes are all at the surface and the elevated emissions are simply scaled by the surface emissions.  It is recommended to run a test simulation and write out the surface and elevated emissions in mass fluxes to validate.  Three independent 1-year test simulations were conducted for years 2015 (RUN_STARTDATE="2015-01-01"), 2050 (RUN_STARTDATE="2050-01-01"), and 2100 (RUN_STARTDATE="2100-01-01").  For each run, the existing E3SMv2 SSP585 compset was used but the SSP370 aerosol files were substituted for the default SSP585 aerosol input files.  Also, the user should set history_aerosol = .true. and history_verbose = .ture. to write out the required history fields for the comparisons.  The model output was compared to the original input4mips files using the following scripts (attached at the end of this subsection):

      1. validate_sfc_emissions_DRIVER.py:  Iteratively calls the NCL scripts and defined some variables

      2. validate_sfc_emissions_inputs.ncl:  Namelist for corresponding NCL script

      3. validate_sfc_emissions.ncl:  Does the primary analysis steps and plotting

      4. validate_elev_emissions_DRIVER.py:  Iteratively calls the NCL scripts and defined some variables

      5. validate_elev_emissions_inputs.ncl:  Namelist for corresponding NCL script

      6. validate_elev_emissions.ncl:  Does the primary analysis steps and plotting

    4. Based on (c), we define the following color shading codes:  Successfully validated and could not be readily validated

      1. SURFACE emissions

        1. …/cmip6_ssp370_mam4_bc_a4_surf_2015-2100_c210216.nc:  AGR, ENE, IND, TRA, RCO, SLV, WST, SHP (time, lat, lon): Compare sum across sectors to model output variable:  SFbc_a4

        2. .../cmip6_ssp370_mam4_num_a1_surf_2015-2100_c210216.nc:  num_a1_SO4_AGR, num_a1_SO4_SHP, num_a1_SO4_SLV, num_a1_SO4_WST  (time, lat, lon)

          1. Compare sum across sectors to model output variable:  SFnum_a1

          2. Cannot easily check this and Hailong suggested that if the mass fluxes are close between the input and output files then so will be the number concentrations.  E3SM's SFnum_a1 contains SO4 but also dust, sea salt and marine organic aerosols.  However, it appears that there are no other variables in the output to account for number fluxes of sea salt and marine organics. A rough estimate of the magnitude could probably be obtained from the respective mass concentration SFncl_a1 and SFmom_a1, but then we'd have to estimate sizes and this is not worth the trouble.

        3. .../cmip6_ssp370_mam4_num_a2_surf_2015-2100_c210216.nc:  num_a2_SO4_RCO, num_a2_SO4_TRA (time, lat, lon): Compare sum across sectors to model output variable:  SFnum_a2. Cannot easily check this, see note for SFnum_a1.

        4. .../cmip6_ssp370_mam4_num_a4_surf_2015-2100_c210216.nc:  num_a1_BC_AGR, num_a1_BC_ENE, num_a1_BC_IND, num_a1_BC_RCO, num_a1_BC_SHP, num_a1_BC_SLV, num_a1_BC_TRA, num_a1_BC_WST, num_a1_POM_AGR, num_a1_POM_ENE, num_a1_POM_IND, num_a1_POM_RCO, num_a1_POM_SHP, num_a1_POM_SLV, num_a1_POM_TRA, num_a1_POM_WST

          1. Compare sum across sectors to model output variable:  SFnum_a4

          2. Note 1:  per Hailong's suggestion, sum the species and sectors in the input file and convert units from "(particles/cm2/s) * 6.022e26" to "1/m2/s" by multiplying the input file data by (1./6.022E26)*(100**2) -- that is, mfactor in validate_sfc_emissions_DRIVER.py.

          3. Note 2:  Data from the input files is consistently 0.7% lower than in the model output, so it's possible that there are species contained in SFnum_a4 that are not included in the input file… but the difference is small enough (and temporally consistent) that this raises no alarms.

        5. .../cmip6_ssp370_mam4_pom_a4_surf_2015-2100_c210216.nc:  AGR, ENE, IND, TRA, RCO, SLV, WST, SHP (time, lat, lon): Compare sum across sectors to model output variable:  SFpom_a4

        6. .../cmip6_ssp370_mam4_so2_surf_2015-2100_c210216.nc:  AGR, TRA, RCO, SLV, WST, SHP (time, lat, lon): Compare sum across sectors to model output variable:  SFSO2

        7. .../cmip6_ssp370_mam4_so4_a1_surf_2015-2100_c210216.nc:  AGR, SLV, WST, SHP (time, lat, lon): Compare sum across sectors to model output variable:  SFso4_a1

        8. .../cmip6_ssp370_mam4_so4_a2_surf_2015-2100_c210216.nc:  RCO, TRA (time, lat, lon): Compare sum across sectors to model output variable:  SFso4_a2

        9. NOTE:  Model output fields (e.g., SFbc_a1, SFbc_a3, SFpom_a1, SFpom_a3, and SFso4_a3) that do not have an analog in the input4mips files should be zero in the model output.

      2. ELEVATED emissions

        1. …/cmip6_ssp370_mam4_bc_a4_elev_2015-2100_c210216.nc:  BB (time, altitude=13, lat, lon) in units "molecules/cm3/s"

          1. Compare sum across sectors to model output variable:  bc_a4_CLXF

          2. Per Hailong's suggestion:  Compare vertical integral of BB to bc_a4_CLXF, acknowledging that resulting spatial map differences may arise due to known remapping issues/limitations.  This applies to all "elevated" emissions fluxes.

        2. .../cmip6_ssp370_mam4_num_a1_elev_2015-2100_c210216.nc:  num_a1_SO4_ELEV_BB, num_a1_SO4_ELEV_ENE, num_a1_SO4_ELEV_IND, num_a1_SO4_ELEV_contvolc  (time, altitude=13, lat, lon) in units "(particles/cm3/s) * 6.022e26"

          1. Compare sum across sectors to model output variable:  num_a1_CLXF

          2. Per Hailong's suggestion:  Compare the vertical integral of the sum of num_a1_SO4_ELEV_* from input file to num_a1_CLXF from model output.  Note that there are no natural contributions within the elevated number emissions fluxes, as there were for surface num_a[1,2], so the issue of "extra" sources is avoided here.  No need to look at spatial map differences because of known remapping issues/limitations.

        3. .../cmip6_ssp370_mam4_num_a2_elev_2015-2100_c210216.nc:  num_a2_SO4_ELEV_contvolc (time, altitude=13, lat, lon) in units "(particles/cm3/s) * 6.022e26"

          1. Compare sum across sectors to model output variable:  num_a2_CLXF

          2. Per Hailong's suggestion:  Compare the vertical integral of the sum of num_a1_SO4_ELEV_* from input file to num_a1_CLXF from model output.  Note that there are no natural contributions within the elevated number emissions fluxes, as there were for surface num_a[1,2], so the issue of "extra" sources is avoided here.  No need to look at spatial map differences because of known remapping issues/limitations.

        4. .../cmip6_ssp370_mam4_num_a4_elev_2015-2100_c210216.nc:  num_a1_BC_ELEV_BB, num_a1_POM_ELEV_BB  (time, altitude=13, lat, lon) in units "(particles/cm3/s) * 6.022e26". Compare sum across sectors to model output variable:  num_a4_CLXF

        5. .../cmip6_ssp370_mam4_pom_a4_elev_2015-2100_c210216.nc:  BB (time, altitude=13, lat, lon) in units "molecules/cm3/s". Compare sum across sectors to model output variable:  pom_a4_CLXF

        6. .../cmip6_ssp370_mam4_so2_elev_2015-2100_c210216.nc:  BB, ENE_ELEV, IND_ELEV, contvolc (time, altitude=13, lat, lon) in units "molecules/cm3/s". Compare sum across sectors to model output variable:  SO2_CLXF

        7. .../cmip6_ssp370_mam4_so4_a1_elev_2015-2100_c210216.nc:  BB, ENE_ELEV, IND_ELEV, contvolc  (time, altitude=13, lat, lon) in units "molecules/cm3/s". Compare sum across sectors to model output variable:  so4_a1_CLXF

        8. .../cmip6_ssp370_mam4_so4_a2_elev_2015-2100_c210216.nc:  contvolc  (time, altitude=13, lat, lon) in units "molecules/cm3/s". Compare sum across sectors to model output variable:  so4_a2_CLXF

        9. .../cmip6_ssp370_mam4_soag_elev_2015-2100_c210216.nc:  SOAbb_src, SOAbg_src, SOAff_src  (time, altitude=12, lat, lon) in units "molecules/cm3/s". Compare sum across sectors to model output variable:  SOAG_CLXF

 

EAM greenhouse gas/ozone/oxidation files (SSP370)

Several radiative forcing files specific to a future SSP must be created.  For E3SMv2, these files are associated with the following EAM namelist entries:

  1. chlorine_loading_file

  2. linoz_data_file

  3. bndtvghg

  4. tracer_cnst_file

One additional namelist entry, linoz_data_path, defines the path to linoz_data_file.  Below are instructions on how to convert raw input4mips data files to input files for E3SM.  These instructions assume the user has access to IDL and Fortran compilers, and the workflow was invoked on NERSC-Cori.

chlorine_loading_file and linoz_data_file

For both chlorine_loading_file and linoz_data_file, the UCI chemistry box model (we’ll denote this BOXMODEL) provided by @Michael J Prather and @Philip Cameron-Smith (Unlicensed) was used to process the raw input4mips files into a form usable by E3SMv2.  It is recommended that the user download the BOXMODEL directory from the weblink using either http downloads or wget (see, e.g., this note on wget). We will first create the chemistry_loading_file following instructions in $BOXMODEL/Linoz_Input/CMIP6_derived_files/README.txt.

  • NOTE:  The steps outlined below have already been completed as part of the process to create the SSP370 compset.  The user may wish to repeat the steps independently, or leverage what already exists in the $BOXMODEL workspace and customize existing files as needed.

  • Step 1: Create a combined GHG concentration file.  The instructions point to two directories:
    (a) ../../GHG_concentrations/CMIP6_DECK_GHG_annual-means_v1.2.0/input4MIPS_data/full_set_of_historical_GHG_files/
    and
    (b) ../../GHG_concentrations/CMIP6_DECK_GHG_annual-means_v1.2.0/input4MIPS_data/full_set_of_SSP585_GHG_files/

    • For (a):  No action is required for historical data.  In the (a) path, there is a .csh script and many subdirectories, each a path to a single netCDF file containing one GHG species specific to historical conditions.  The .csh script reads all the individual .nc files and combines them into a single file, and output, which already exists as part of the provided $BOXMODEL workspace, is located here: $BOXMODEL/GHG_concentrations/CMIP6_DECK_GHG_annual-means_v1.2.0/input4MIPS_data/full_set_of_historical_GHG_files/combined_GHG_concentrations_CMIP6.nc.

    • In (b), there is a .csh script and many subdirectories, each a path to a single netCDF file containing one GHG species specific to SSP585.  The .csh script reads all the individual .nc files and combines them into a single file:  $BOXMODEL/GHG_concentrations/CMIP6_DECK_GHG_annual-means_v1.2.0/input4MIPS_data/full_set_of_SSP585_GHG_files/combined_GHG_concentrations_SSP585_2015-2500.nc.

    • For SSP370, two steps must be taken:

      • (1) Raw input4mips SSP370 data files must be downloaded – see the list of files in $BOXMODEL/GHG_concentrations/CMIP6_DECK_GHG_annual-means_v1.2.0/input4MIPS_data/full_set_of_SSP370_GHG_files.

      • (2) A SSP370 analog to $BOXMODEL/GHG_concentrations/CMIP6_DECK_GHG_annual-means_v1.2.0/input4MIPS_data/full_set_of_SSP585_GHG_files/combine_GHG_concentrations.csh must be created.  Note that in the SSP370 example version of $BOXMODEL/GHG_concentrations/CMIP6_DECK_GHG_annual-means_v1.2.0/input4MIPS_data/full_set_of_SSP370_GHG_files/combine_GHG_concentrations.csh, the variable file_list is slightly different than the SSP585 version in order to accommodate the different input4mips file organization in SSP370.

  • Step 1b: For convenience, create a symbolic link (symlink) from this directory to the combined GHG file for convenience (optional)

    • Create a symlink to the "combined" SSP370 file:

      cd $BOXMODEL/Linoz_Input/CMIP6_derived_files ln -s ../../GHG_concentrations/CMIP6_DECK_GHG_annual-means_v1.2.0/input4MIPS_data/full_set_of_SSP370_GHG_files/combined_GHG_concentrations_SSP370_2015-2500.nc combined_GHG_concentrations_SSP370_2015-2500.nc
    • There was already a symlink to the historical "combined" file, so no action required.

  • Step 2: Generate input concentration file for PRATMO

    • From $BOXMODEL/Linoz_Input/CMIP6_derived_files:

      module load idl idl Extract_GHG_for_PRATMO.pro

      Output to: ./LINOZ_E3SM/data/CMIP6_ghg.dat

    • Modify file Extract_SSP585_GHG_for_PRATMO.pro:

      cp Extract_SSP585_GHG_for_PRATMO.pro Extract_SSP370_GHG_for_PRATMO.pro

      (Change "SSP585" to "SSP370" in several instances.)

    • Run "SSP370" of script:

      idl Extract_SSP370_GHG_for_PRATMO.pro

      Output to: ./LINOZ_E3SM/data/CMIP6_ghg_SSP370.dat

  • Step 3: Run PRATMO first for the historical period and then for SSP370.  Detailed instructions for running PRATMO as part of step 3 only can be found in $BOXMODEL/Linoz_Input/CMIP6_derived_files/LINOZ_E3SM/README.txt -- note:  this is a separate file from the previous README.txt mentioned above.

    • For part (1) in $BOXMODEL/Linoz_Input/CMIP6_derived_files/LINOZ_E3SM/README.txt, no action is needed, assuming that the listed fortran executables have been properly set.

    • For part (2), it is recommended to create multiple copies of bctmx.f, one for each configuration.  For example, the user could create bctmx.f.Historical containing settings for the historical GHG configuration, bctmx.f.SSP370 containing settings for the SSP370 configuration, and so on.  Whenever PRATMO is run, simply copy the desired configuration file to bctmx.f, which is the only version that PRATMO will use.  Note: It is recommended to use the same procedure for batmo.f as well (see “part 3” below).  First, check bctmx.f.Historical near L376 to ensure that the following line is being used:

      open(101,file='data/CMIP6_ghg.dat')

      Then copy the "Historical" version of the file to the "active" bctmx.f version to be compiled:

      cp bctmx.f.Historical bctmx.f
    • For part (3), make copies of batmo.f (see note in part 2 above):  batmo.f.Historical, batmo.f.SSP370, etc.  Check batmo.f.Historical to ensure that:

      • L6:  character*23 FNAME

      • L29:  iyear spans 1845,2015,5

      • L35:  fname='init_fspecies_0000'

      • L168:  CHARACTER*22 FNAME

      • L182:  fname='linoz0000_2010jpl'
        Copy batmo.f.Historical to batmo.f, the version of the file to be compiled.

    • For part (4), compile the executable:

      ifort batmo.f bchem.f bctmx.f bdiel.f bjval.f bpath.f bread.f butil.f -o linoz_file_generator.exe

      You may get a few warnings, but these should not be fatal.  Then, run the script.  NOTE:  The script takes ~3.5 hours to finish, so one option is to run the script on a login node using tmux, which is supported on most supercomputing centers and otherwise is freely available:

      tmux ./linoz_file_generator.exe > stdout.txt

      Output is:  $BOXMODEL/Linoz_Input/CMIP6_derived_files/LINOZ_E3SM/... 
      ...init_fspecies_1845 to init_fspecies_2015  (linoz v2 table)
      ...linoz1845_2010jpl to linoz2015_2010jpl  (output abundances of long-lived species in mole/mole as a function of lat, mon, and z)
      See $BOXMODEL/Linoz_Input/CMIP6_derived_files/LINOZ_E3SM/README.txt for a description of outputs.

    • REPEAT parts (2-4) but for SSP370:

      • SSP370 part (2): Check bctmx.f.SSP370 near L376 to ensure that the following line is used:

        open(101,file='data/CMIP6_ghg_SSP370.dat')

        Copy bctmx.f.SSP370 to bctmx.f.