Exposing Workflow base metrics to Prometheus

SonataFlow generates metrics that can be consumed by Prometheus and visualized by dashboard tools, such as OpenShift, Dashbuilder, and Grafana.

This document describes how you can enable and expose the generated metrics to Prometheus.

Enabling metrics in SonataFlow

You can enable the metrics in your workflow application.

Prerequisites
Procedure
  1. To add the metrics to your workflow application, add the org.kie:kie-addons-quarkus-monitoring-prometheus dependency to the pom.xml file of your project:

    Dependency to be added to the pom.xml file to enable metrics
    <dependency>
        <groupId>org.kie</groupId>
        <artifactId>kie-addons-quarkus-monitoring-prometheus</artifactId>
    </dependency>
  2. Rebuild your workflow application.

    The metrics is available at /q/metrics endpoint.

Metrics consumption in SonataFlow

After enabling the metrics in SonataFlow, the generated metrics can be consumed from OpenShift, Kubernetes, and Prometheus to visualize on different dashboard tools.

Consuming metrics from OpenShift

If your workflow server is running on OpenShift, then you can use the server to monitor your workflow application. Also, you can perform the task of consuming metrics from OpenShift.

Prerequisites
Procedure
  1. To consume metrics from OpenShift, enable monitoring for user-defined projects.

    For more information, see Enabling monitoring for user-defined projects in OpenShift documentation.

    When you enable monitoring for user-defined projects, the Prometheus Operator is installed automatically.

  2. Create a service monitor as shown in the following configuration:

    Example configuration in service-monitor.yaml
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: prometheus-app-monitor
      name: prometheus-app-monitor
      namespace: my-project
    spec:
      endpoints:
      - interval: 30s
        targetPort: 8080
        path: /q/metrics
        scheme: http
      selector:
        matchLabels:
          app-with-metrics: 'serverless-workflow-app'
  3. Run the following command to apply the service monitor:

    Apply service monitor
    oc apply -f service-monitor.yaml

In the previous procedure, a service monitor named prometheus-app-monitor is created, which selects applications containing the label as app-with-metrics: serverless-workflow-app. Ensure that your workflow application contains the same label.

After that, Prometheus sends request to the /q/metrics endpoint for all the services that are labeled with app-with-metrics: serverless-workflow-app every 30 seconds. For more information about monitoring Quarkus application using Micrometer and Prometheus into OpenShift, see Quarkus - Micrometer Metrics.

Consuming metrics from Kubernetes is similar to OpenShift. However, you need to install the Prometheus Operator project manually.

For more information about installing Prometheus Operator, see Prometheus Operator website.

Consuming metrics from Prometheus

If your workflow server is running on Prometheus, then you can perform the task of consuming metrics from Prometheus and visualize the workflow on different dashboard tools.

Prerequisites
Procedure
  1. Use the following configuration to enable Prometheus to remove metrics directly from the workflow application:

    Example Prometheus configuration
    - job_name: 'Serverless Workflow App'
        scrape_interval: 2s
        metrics_path: /q/metrics
        static_configs:
            - targets: ['localhost:8080']
  2. Replace the values of job_name and scrap_interval in the previous configuration with your own values.

  3. Ensure that target under static_configs parameter in Prometheus configuration matches with your workflow application location.

    For more information about configuring Prometheus, see Configure Prometheus to monitor the sample targets in Prometheus Getting Started document.

Metrics in SonataFlow

Overview

In SonataFlow, you can check the following metrics:

  • kogito_process_instance_started_total: Number of started workflows.

  • kogito_process_instance_running_total: Number of running workflows.

  • kogito_process_instance_completed_total: Number of completed workflows.

  • kogito_process_instance_error: Number of workflows that report an error.

  • kogito_process_instance_duration_seconds: Duration of a workflow instance in seconds.

  • kogito_node_instance_duration_milliseconds: Duration of relevant nodes in milliseconds.

  • sonataflow_input_parameters_counter_total: Records input parameters, the occurrences of <"param_name","param_value"> per processId.

Internally, workflows are referred as processes. Therefore, the processId and processName are workflow id and name respectively.

Each of the metrics mentioned previously contains a label for a specific workflow id. For example, the kogito_process_instance_completed_total metric below contains the labels for callbackstatetimeouts workflow:

Example kogito_process_instance_completed_total metric
# HELP kogito_process_instance_completed_total Completed Process Instances
# TYPE kogito_process_instance_completed_total counter
kogito_process_instance_completed_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",process_state="Completed",version="1.0.0-SNAPSHOT",} 3.0

Internally, SonataFlow uses Quarkus Micrometer extension, which also exposes built-in metrics. You can disable the Micrometer metrics in SonataFlow. For more information, see Quarkus - Micrometer Metrics.

Metrics Description

kogito_process_instance_started_total

Count the number of started workflow instances.

# HELP kogito_process_instance_started_total Started Process Instances
# TYPE kogito_process_instance_started_total counter
kogito_process_instance_started_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 7.0

kogito_process_instance_running_total

Records the number of running workflow instances.

This includes workflow instances that are in the Error state, since the error state is not a terminal state. Process instances that have reached a terminal status, i.e. Completed or Aborted, are not present in this metric.

# HELP kogito_process_instance_running_total Running Process Instances
# TYPE kogito_process_instance_running_total gauge
kogito_process_instance_running_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 4.0

kogito_process_instance_completed_total

Workflow instances that have reached a terminal status, Aborted or Completed, and thus are considered as completed.

These are the only two terminal status. The Error state is not terminal. Additionally, the metric has the process_state=Completed, or could be Aborted, to register exactly which of the two terminal status were reached.

# HELP kogito_process_instance_completed_total Completed Process Instances
# TYPE kogito_process_instance_completed_total counter
kogito_process_instance_completed_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",process_state="Completed",version="1.0.0-SNAPSHOT",} 3.0

kogito_process_instance_error

Records the number of errors that have occurred per processId and error, including the error message.

# HELP kogito_process_instance_error_total Number of errors that has occurred
# TYPE kogito_process_instance_error_total counter
kogito_process_instance_error_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",error_message="java.net.ConnectException - Connection refused",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 1.0

kogito_process_instance_duration_seconds

Calculates duration of a workflow instance that has reached a terminal state, i.e. Aborted or Completed. This metric is registered when the workflow reaches the terminal state.

# HELP kogito_process_instance_duration_seconds_max Process Instances Duration
# TYPE kogito_process_instance_duration_seconds_max gauge
   kogito_process_instance_duration_seconds_max{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 30.0


# HELP kogito_process_instance_duration_seconds Process Instances Duration
# TYPE kogito_process_instance_duration_seconds summary
   kogito_process_instance_duration_seconds_count{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 3.0
   kogito_process_instance_duration_seconds_sum{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 90.0

kogito_node_instance_duration_milliseconds

Records the duration of the execution for nodes relevant to the workflows. The metric is calculated when a given node has finished executing.

# HELP kogito_node_instance_duration_milliseconds_max Relevant nodes duration in milliseconds
# TYPE kogito_node_instance_duration_milliseconds_max gauge
kogito_node_instance_duration_milliseconds_max{artifactId="serverless-workflow-project",node_name="CallbackState",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 30014.0


# HELP kogito_node_instance_duration_milliseconds Relevant nodes duration in milliseconds
# TYPE kogito_node_instance_duration_milliseconds summary
kogito_node_instance_duration_milliseconds_count{artifactId="serverless-workflow-project",node_name="CallbackState",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 3.0
kogito_node_instance_duration_milliseconds_sum{artifactId="serverless-workflow-project",node_name="CallbackState",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 90128.0

sonataflow_input_parameters_counter_total

Records the occurrences of <"param_name", "param_value"> per processId.

Parameters that are json values, or arrays are flattened.

# HELP sonataflow_input_parameters_counter_total Input parameters
# TYPE sonataflow_input_parameters_counter_total counter
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="name",param_value="John",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 1.0
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="surname.sur1",param_value="Lennon",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 1.0
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="name",param_value="Paul",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 5.0
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="surname.sur1",param_value="McCartney",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 5.0

Found an issue?

If you find an issue or any misleading information, please feel free to report it here. We really appreciate it!