Workflow Operations¶

Pausing and Resuming Workflow Execution¶

An execution of a Mistral workflow can be paused by running st2 execution pause <execution-id>. An execution must be in a running state in order for pause to be successful. The execution will initially go into a pausing state, and will go into a paused state when no more tasks are in an active state such as running, pausing, or canceling. When a workflow execution is paused, it can be resumed by running st2 execution resume <execution-id>.

The pause and resume operation will cascade down to subworkflows, whether it’s another workflow defined in a workbook or it’s another StackStorm action that is a Mistral workflow or Action Chain. If the pause operation is performed from a subworkflow or subchain, then the pause will cascade up to the parent workflow or parent chain. However, if the resume operation is performed from a subworkflow or subchain, the resume will not cascade up to the parent workflow or parent chain. This allows users to resume and troubleshoot branches individually.

Canceling Workflow Execution¶

An execution of a Mistral workflow can be cancelled by running st2 execution cancel <execution-id>. Workflow tasks that are still running will not be canceled and will run to completion. No new tasks for the workflow will be scheduled.

Re-running Workflow Execution¶

An execution of a Mistral workflow can be re-run on error. The execution either can be re-run from the beginning or from the task(s) that failed. The latter is useful for long running workflows with temporary service or network outages. Re-running the workflow execution from the beginning is exactly like re-running any StackStorm execution with the command st2 execution re-run <execution-id>.

The re-run is a completely separate execution with a new execution ID in both StackStorm and Mistral. Re-running the workflow from where it errored is slightly different. To retain context, the original workflow execution is reused in Mistral but a new StackStorm execution will be created to stay consistent in StackStorm. The re-run command has a new --tasks option that takes a list of task names to re-run.

For example, given a workflow that fails at task3 and task4 on separate parallel branches, the command st2 execution re-run <execution-id> --tasks task3 task4 will resume the Mistral workflow execution and re-run both task3 and task4 using original inputs. Both the workflow and task execution in Mistral have to be in an errored state for re-run.

If using a Mistral workbook, tasks of subworkflows can also be re-run. For example, if the main workflow has a task1 that calls subflow1, then to re-run subtask1 of subflow1, the syntax for the st2 execution re-run command would be st2 execution re-run <execution-id> --tasks task1.subtask1.

If the task to re-run is a “with-items” task, there is an option to re-run only failed iterations. For example, task1 is a with-items task with 5 items. Let’s say 2 of the items failed. By specifying the st2 execution re-run --tasks task1 task2 --no-reset task1 option, task1 will only re-run the 2 items that failed. If the --no-reset option is not provided, then all 5 items will be re-run.

Note

Re-running workflow execution from the task(s) that failed is currently an experimental feature and subject to bug(s) and change(s). Please also note that re-running a subtask nested in another StackStorm action is not currently supported.

Task Timeout vs Action Timeout¶

Mistral supports a task timeout: parameter. This sets the maximum amount of time Mistral will wait before marking a task as failed. However, StackStorm actions implement their own timeouts. The default value for each action timeout depends upon the action runner used. Typically this is 60s for SSH-based actions, and 600s for Python actions. The default can be changed on a per-action basis, and can be over-ridden for each execution.

This can cause confusion when you need to extend the timeout for some tasks. Setting a longer Mistral timeout does not extend the underlying action timeout. For example, if you have a long-running command, this will not achieve the desired result:

version: '2.0'

examples.task-timeout:
    type: direct
    input:
        - command
    tasks:
        task1:
            timeout: 120
            action: core.local
            input:
                cmd: <% $.command %>

Instead, set the timeout on the underlying action. Note the indentation here:

version: '2.0'

examples.action-timeout:
    type: direct
    input:
        - command
    tasks:
        task1:
            action: core.local
            input:
                cmd: <% $.command %>
                timeout: 120