Overview
Conductor is a simple and elegant tool that helps orchestrate your research computing. Conductor automates your research computing pipeline, all the way from experiments to figures in your paper.
Installing
Conductor requires Python 3.8+ and is currently only supported on macOS and Linux machines. It has been tested on macOS 10.14 and Ubuntu 20.04.
Conductor is available on PyPI and so
it can be installed using pip.
pip install conductor-cli
After installation, the cond executable should be available in your shell.
cond --help
Getting Started
A quick way to get started is to look at Conductor's example projects. Below is a quick overview of a few important Conductor concepts.
Project Root
When using Conductor with your project, you first need to add a
cond_config.toml file to your project's root directory. This file tells
Conductor where your project files are located and is important because all
task identifiers (defined below) are relative to your project root.
Tasks
Conductor works with "tasks", which are jobs (arbitrary shell commands or
scripts) that it should run. You define tasks in COND files using Python
syntax. All tasks are of a predefined "type" (e.g., run_experiment()), which
are listed in the task types reference documentation.
Conductor's tasks are very similar to (and inspired by) Bazel's and Buck's build rules.
Task Identifiers
A task is identified using the path to the COND file where it is defined
(relative to your project's root directory), followed by its name. For example,
a task named run_benchmark defined in a COND file located in
experiments/COND would have the task identifier //experiments:run_benchmark.
To have Conductor run the task, you run cond run //experiments:run_benchmark in your shell.
Dependencies
Tasks can be dependent on other tasks. To specify a dependency, you use the
deps keyword argument when defining a task. When running a task that has
dependencies, Conductor will ensure that all of its dependencies are executed
first before the task is executed. This allows you to build a dependency graph
of tasks, which can be used to automate your entire research computing pipeline.
Task Outputs
Tasks usually (but not always) will need to produce output file(s) (e.g.,
measurements, figures). When Conductor runs a task, it will set the
COND_OUT environment variable to a path where the task should write its
outputs. See the example projects for an example of how this is used. All
task outputs will be stored under the cond-out directory.
Similarly, Conductor will also set the COND_DEPS environment variable to a
colon (:) separated list of paths to the task's dependencies' outputs. If
the task has no dependencies, the COND_DEPS environment variable will be
set to an empty string.
It's important to write task outputs to the path specified by COND_OUT.
This ensures other tasks can find the current task's outputs, and also allows
Conductor to archive your tasks' outputs.