Extending canesm-processor#

The simplest way to use your own code in canesm-processor is to simply pass functions when setting up a DAG. For examples, lets say I have a function fancy_analysis in my package awesome_stats that I want to call as part of a pipeline. Then I can simply write

from canproc import DAGProcess
from awesome_stats import fancy_analysis
dag = DAG(
    dag=[
         DAGProcess(name='temperature', function='xr.open_mfdataset', args=['path/to/files/*/nc']),
         DAGProcess(name='output', function=fancy_analysis, args=['temperature']),
         DAGProcess(name='output_test', function=fancy_analysis, args=['temperature'], kwargs={'myoption': True})
        ],
    output=['output', 'output_test']
)

This works, but if we want to write DAG templates or save a DAG for later, fancy_analysis isn’t serializable. To handle this case, we need to tell canesm-processor where your code is and what you want to call it. To do this we use the register_module function:

from canproc import register_module
import awesome_stats
register_module(awesome_stats, prefix='awst')

Now we can write DAGs in a serializable format so your function can be serialized or called externally.

from canproc import DAGProcess
from awesome_stats import fancy_analysis
dag = DAG(
    dag=[
         DAGProcess(name='temperature', function='xr.open_mfdataset', args=['path/to/files/*.nc']),
         DAGProcess(name='output', function='awst.fancy_analysis', args=['temperature']),
         DAGProcess(name='output_test', function='awst.fancy_analysis', args=['temperature'], kwargs={'myoption': True})
        ],
    output=['output', 'output_test']
)