API Reference - Telemetry

Core Components

EventMetrics

class EventMetrics

Metrics for a single event execution.

Attributes:

Parameters:
  • event_name – Name of the event

  • task_id – Unique task identifier

  • start_time – Event start timestamp

  • end_time – Event end timestamp

  • status – Execution status (pending/completed/failed)

  • error – Error message if failed

  • retry_count – Number of retry attempts

  • process_id – Process ID executing the event

Methods:

duration() float

Calculate execution duration in seconds.

NetworkMetrics

class NetworkMetrics

Metrics for a network operation.

Attributes:

Parameters:
  • task_id – Unique task identifier

  • host – Remote host address

  • port – Remote port number

  • start_time – Operation start timestamp

  • end_time – Operation end timestamp

  • bytes_sent – Number of bytes sent

  • bytes_received – Number of bytes received

  • error – Error message if failed

  • operation – Operation type identifier

Methods:

latency() float

Calculate operation latency in seconds.

Telemetry Collection

TelemetryLogger

class TelemetryLogger

Thread-safe telemetry logger for event pipeline monitoring.

Methods:

start_event(event_name: str, task_id: str, process_id: int | None = None)

Record the start of an event execution.

end_event(task_id: str, error: str | None = None)

Record the end of an event execution.

record_retry(task_id: str)

Record a retry attempt for an event.

get_metrics(task_id: str) EventMetrics | None

Get metrics for a specific task.

NetworkTelemetry

class NetworkTelemetry

Thread-safe telemetry for network operations.

Methods:

start_operation(task_id: str, host: str, port: int, operation: str = 'remote_call')

Record the start of a network operation.

end_operation(task_id: str, bytes_sent: int = 0, bytes_received: int = 0, error: str | None = None)

Record the end of a network operation.

get_metrics(task_id: str) NetworkMetrics | None

Get metrics for a specific operation.

get_failed_operations() Dict[str, NetworkMetrics]

Get metrics for all failed operations.

get_slow_operations(threshold_seconds: float = 1.0) Dict[str, NetworkMetrics]

Get metrics for operations that took longer than threshold.

Metrics Publishing

MetricsPublisher

class MetricsPublisher

Base interface for metrics publishing adapters.

Methods:

publish_event_metrics(metrics: EventMetrics)

Publish event metrics to the backend system.

publish_network_metrics(metrics: dict)

Publish network metrics to the backend system.

ElasticsearchPublisher

class ElasticsearchPublisher

Publishes metrics to Elasticsearch.

Parameters:

Parameters:
  • hosts – List of Elasticsearch hosts

  • index_prefix – Prefix for metrics indices

  • **kwargs

    Additional Elasticsearch client options

PrometheusPublisher

class PrometheusPublisher

Exposes metrics for Prometheus scraping.

Parameters:

Parameters:
  • port – Port to expose metrics on

  • addr – Interface address to bind to

  • registry – Custom Prometheus registry

GrafanaCloudPublisher

class GrafanaCloudPublisher

Publishes metrics to Grafana Cloud.

Parameters:

Parameters:
  • api_key – Grafana Cloud API key

  • org_slug – Organization identifier

  • endpoint – Custom API endpoint

CompositePublisher

class CompositePublisher

Publishes metrics to multiple backends simultaneously.

Parameters:

Parameters:

publishers – List of MetricsPublisher instances

Utility Functions

monitor_events(publishers: List[MetricsPublisher] | None = None)

Start monitoring pipeline events and collecting metrics.

get_metrics() str

Get all collected metrics as JSON string.

get_failed_events() list

Get metrics for all failed events.

get_slow_events(threshold_seconds: float = 1.0) List[Dict[str, Any]]

Get metrics for events that took longer than threshold.

get_retry_stats() Dict[str, Any]

Get retry statistics.

get_failed_network_ops() Dict[str, Any]

Get metrics for failed network operations.

get_slow_network_ops(threshold_seconds: float = 1.0) Dict[str, Any]

Get metrics for slow network operations.