Serializing and Composing DAGs

Contents

Serializing and Composing DAGs#

If we just wanted to write functions that could be ran in a DAG, we wouldn’t need canesm-processor. One of the goals of this work is to allow for a DAG to be serializable so they can be easily stored, templated and composed into larger transforms without needing to worry about execution details.

JSON Format#

DAGs can be represented as JSON blocks

compute_mean.json#

{
    "dag": [
        {
            "name": "arr1",
            "function": "np.arange",
            "args": [8],
        },
        {
            "name": "arr1",
            "function": "np.arange",
            "args": [0, 4],
        },
                    {
            "name": "concat",
            "function": "np.concatenate",
            "args": [["arr1", "arr2"]],
        },
        {
            "name": "mean",
            "function": "np.mean",
            "args": ["concat"],
        }
    ],
    "output": "mean"
}

import numpy as np
from canproc import DAGProcess, DAG
dag = DAG(dag=[
                DAGProcess(name="arr1", function=np.arange, args=[8]),
                DAGProcess(name="arr2", function=np.arange, args=[0, 4]),
                DAGProcess(name="concat", function=np.concatenate, args=[["arr1", "arr2"]]),
                DAGProcess(name="mean", function=np.mean, args=["concat"])
            ],
        output="mean"
)