API Reference - Telemetry¶
Core Components¶
EventMetrics¶
- class EventMetrics¶
Metrics for a single event execution.
Attributes:
- Parameters:
event_name – Name of the event
task_id – Unique task identifier
start_time – Event start timestamp
end_time – Event end timestamp
status – Execution status (pending/completed/failed)
error – Error message if failed
retry_count – Number of retry attempts
process_id – Process ID executing the event
Methods:
- duration() float¶
Calculate execution duration in seconds.
NetworkMetrics¶
- class NetworkMetrics¶
Metrics for a network operation.
Attributes:
- Parameters:
task_id – Unique task identifier
host – Remote host address
port – Remote port number
start_time – Operation start timestamp
end_time – Operation end timestamp
bytes_sent – Number of bytes sent
bytes_received – Number of bytes received
error – Error message if failed
operation – Operation type identifier
Methods:
- latency() float¶
Calculate operation latency in seconds.
Telemetry Collection¶
TelemetryLogger¶
- class TelemetryLogger¶
Thread-safe telemetry logger for event pipeline monitoring.
Methods:
- start_event(event_name: str, task_id: str, process_id: int | None = None)¶
Record the start of an event execution.
- end_event(task_id: str, error: str | None = None)¶
Record the end of an event execution.
- record_retry(task_id: str)¶
Record a retry attempt for an event.
- get_metrics(task_id: str) EventMetrics | None¶
Get metrics for a specific task.
NetworkTelemetry¶
- class NetworkTelemetry¶
Thread-safe telemetry for network operations.
Methods:
- start_operation(task_id: str, host: str, port: int, operation: str = 'remote_call')¶
Record the start of a network operation.
- end_operation(task_id: str, bytes_sent: int = 0, bytes_received: int = 0, error: str | None = None)¶
Record the end of a network operation.
- get_metrics(task_id: str) NetworkMetrics | None¶
Get metrics for a specific operation.
- get_failed_operations() Dict[str, NetworkMetrics]¶
Get metrics for all failed operations.
- get_slow_operations(threshold_seconds: float = 1.0) Dict[str, NetworkMetrics]¶
Get metrics for operations that took longer than threshold.
Metrics Publishing¶
MetricsPublisher¶
- class MetricsPublisher¶
Base interface for metrics publishing adapters.
Methods:
- publish_event_metrics(metrics: EventMetrics)¶
Publish event metrics to the backend system.
- publish_network_metrics(metrics: dict)¶
Publish network metrics to the backend system.
ElasticsearchPublisher¶
- class ElasticsearchPublisher¶
Publishes metrics to Elasticsearch.
Parameters:
- Parameters:
hosts – List of Elasticsearch hosts
index_prefix – Prefix for metrics indices
**kwargs –
Additional Elasticsearch client options
PrometheusPublisher¶
- class PrometheusPublisher¶
Exposes metrics for Prometheus scraping.
Parameters:
- Parameters:
port – Port to expose metrics on
addr – Interface address to bind to
registry – Custom Prometheus registry
GrafanaCloudPublisher¶
- class GrafanaCloudPublisher¶
Publishes metrics to Grafana Cloud.
Parameters:
- Parameters:
api_key – Grafana Cloud API key
org_slug – Organization identifier
endpoint – Custom API endpoint
CompositePublisher¶
- class CompositePublisher¶
Publishes metrics to multiple backends simultaneously.
Parameters:
- Parameters:
publishers – List of MetricsPublisher instances
Utility Functions¶
- monitor_events(publishers: List[MetricsPublisher] | None = None)¶
Start monitoring pipeline events and collecting metrics.
- get_metrics() str¶
Get all collected metrics as JSON string.
- get_failed_events() list¶
Get metrics for all failed events.
- get_slow_events(threshold_seconds: float = 1.0) List[Dict[str, Any]]¶
Get metrics for events that took longer than threshold.
- get_retry_stats() Dict[str, Any]¶
Get retry statistics.
- get_failed_network_ops() Dict[str, Any]¶
Get metrics for failed network operations.
- get_slow_network_ops(threshold_seconds: float = 1.0) Dict[str, Any]¶
Get metrics for slow network operations.