Stages#
A “stage” is a logical grouping of operations. Aside from the setup stage,
all stages apply a series of operations to a list of variables. Outputs from one stage
may be used as inputs to following stages if desired by using the keyword reuse: stage_name.
A stage has a few options:
Variables#
A list of variables to process in this stage. The variable name should correspond to a CanESM variable.
mystage:
variables:
- GT
- ST
Applying Operations#
Note
Operations are applied after all other parts have completed. For example in the following:
monthly:
variables:
GT:
shift: 273.15
First monthly averaging is performed, then the values are shifted by 273.15. For linear operations
this is generally more efficient, but can change results for non-linear operations. If an operation
should be performed before the monthly calculation then it should be put in a previous stage.
DAG Format#
Operations can be applied to a variable using the DAG format.
mystage:
variables:
- GT:
dag:
dag:
- name: renamed
function: xr.self.rename
args: [GT]
kwargs:
GT: ST
- name: initial_time
function: xr.self.isel
args: [renamed]
kwargs:
time: 0
- name: global_mean
function: xr.self.mean
args: [initial_time]
output: global_mean
This format provides the most flexibility for creating complex dags, but is often overkill for simple function calls.
For example, ince self is specified in the function names we know what the first argument is.
Similarly, lists are ordered, so we know the order in which these are called, and assume we want the output of the final operation.
Therefore, a simplified version is also supported and equivalent to the above:
mystage:
variables:
- GT:
dag:
- function: xr.self.rename
kwargs:
GT: ST
- function: xr.self.isel
kwargs:
time: 0
- function: xr.self.mean
Shortcuts#
Common operations can be applied using some keyword shortcuts. These are expanded internally to their DAG representation so are equivalent.
mystage:
variables:
- GT:
# convert to fahrenheit and rename to "ST"
rename: ST
scale: 1.8
shift: 32
xarray Dataset and DataArray operations can also be applied directly with keyword arguments provided as a dictionary:
mystage:
variables:
- GT:
# get the first value of every month and
rename: ST
groupby: {group: "time.month"}
first: {} # if no keyword arguments are needed provide an empty dictionary
area_mean: {method: sum}
Spatial Averaging#
We can compute the averages of variables over a specified region or using specified weights using the area_mean keyword. For example, we can use the area_mean keyword for all variables in a given stage.
mystage:
area_mean: True
variables:
- FLND
- GT
In this example, each variable in the stage is spatially averaged over the global grid (default). To specify averaging over a particular region or to use custom weighting, the region and weights keywords can be used.
mystage:
variables:
- FLND
- GT
- GT_tropics:
branch: GT
area_mean:
region:
lat: [-10, 10]
lon: [-100, 30]
- GT_ilnd:
branch: GT
area_mean:
weights: FLND
Computed Values#
It is common to combine multiple CanESM variables into an output variable. As a shorthand these can be provided as a formula. For example, if we wanted to take the difference between a few monthly averaged fields we could write:
monthly:
variables:
- OLR
- FSR
- FSO
- BALT: "FSO-FSR-OLR"
Formula parsing is based on python’s ast module, so most arithmetic syntax supported by python can be used.
For example, BALT: "2.4 * (FSO + FSR) - ((OLR - FSR) / (OLR + FSR))" would be a valid (if meaningless) formula.
If additional operations need to be added to a computed variable this can be written as:
- BALT:
compute: "FSO-FSR-OLR"
destination: null
Note
As with other operations, computions are performed after the input variables have been transformed in the stage. So in the example above, BALT is computed using the monthly average values of FSO, FSR and OLR.
Masking Values#
We can create and apply masks using the mask keyword. For example, lets say we want the monthly
average of cloud tops for deep convection (TCD). First, we need to mask the native data on the locations that
have deep convection, CDCB > 0, then perform a monthly resampling of that masked data. To accomplish
this we use two stages: in the first stage we apply a mask to TCD and in the second we take the monthly average
using this masked data.
setup:
stages:
- transforms
- monthly
transforms:
variables:
- CDCB
- TCD:
rename: CI
mask: CDCB > 0
monthly:
reuse: transforms
variables:
- TCD
Branching from a Variable#
Sometimes it can be useful to branch a variable (think of this as a git branch) where we are spinning off a copy at a known point. This can be useful if we want to keep both the original and a new version of the variable around for later modifications. As an example, in CMIP we need to save the same variable twice, but with a different name. One way to accomplish that is through branching.
transforms:
variables:
- RH:
rename: relative_humidity
monthly:
reuse: transforms
variables:
- RH
- RH_clear_sky:
branch: RH
rename: relative_humidity_clear_sky
Setting Output Filenames#
Filenames can be changes using the destination keyword.
transforms:
variables:
- RH:
rename: relative_humidity
destination: rh_no_mask.nc
- BALT:
compute: FSO - FSR - OLR
destination: "top_of_atmosphere_flux.nc"
Saving files can also be turned off by setting destination to null. This can be useful
for intermediate stages such as creating masks.
transforms:
variables:
- RH:
rename: relative_humidity
destination: rh_no_mask.nc
- BALT:
compute: FSO - FSR - OLR
destination: null
Configuration Constants#
Sometimes it useful to define a constant that can be reused across files. Constants
are defined in the setup stage, so any value defined here can be reused in other
stages by accessing them via ${group.name}
As an example, dask performance is highly dependent on chunk size, but optimizing this parameter also depends on the machine in use. To allow adjustment of chunks in a single place we can setup a variable
setup:
chunks:
canam:
gem_3d: {time: 8, lat: -1, lon: -1, level2: -1, level3: -1, level1: -1}
gem_2d: {time: 96, lat: -1, lon: -1}
monthly:
reuse: daily
variables:
- CLD:
rename: cls
chunks: ${chunks.canam.gem_3d}
- CICT:
rename: clivi
chunks: ${chunks.canam.gem_2d}