๐Ÿ“ฆ yoshinorin / grafana-observability-stack

โ˜… 0 stars โ‘‚ 0 forks ๐Ÿ‘ 0 watching
๐Ÿ“ฅ Clone https://github.com/yoshinorin/grafana-observability-stack.git
HTTPS git clone https://github.com/yoshinorin/grafana-observability-stack.git
SSH git clone git@github.com:yoshinorin/grafana-observability-stack.git
CLI gh repo clone yoshinorin/grafana-observability-stack
yoshinorin yoshinorin Update README.md 48adf57 5 months ago ๐Ÿ“ History
๐Ÿ“‚ 48adf5767ec60d7042d013e7280c61aa629afee5 View all commits โ†’
๐Ÿ“ config
๐Ÿ“„ .gitignore
๐Ÿ“„ docker-compose.yml
๐Ÿ“„ README.md
๐Ÿ“„ test-traces.py
๐Ÿ“„ README.md

Grafana Observability Stack

Local observability stack equivalent to Grafana Cloud Free tier.

[!WARNING]
Please use docker-otel-lgtm instead.

Architecture

Component Overview

This stack provides a comprehensive observability solution with the following components:

Grafana (port: 3001)

  • Purpose: Unified dashboard and visualization platform
  • What it does:
  • Creates dashboards combining metrics, logs, and traces
  • Provides alerting capabilities
  • Correlates data across different observability signals
  • Why needed: Central place to visualize and analyze all observability data

Prometheus (port: 9091)

  • Purpose: Metrics collection, storage, and alerting
  • What it does:
  • Scrapes metrics from applications and infrastructure
  • Stores time-series data efficiently
  • Provides powerful query language (PromQL)
  • Handles alerting rules
  • Why needed: Essential for monitoring application performance, resource usage, and SLA metrics

Tempo (port: 3200)

  • Purpose: Distributed tracing backend
  • What it does:
  • Collects and stores distributed traces
  • Tracks requests across microservices
  • Identifies bottlenecks and errors in request flows
  • Integrates seamlessly with Grafana for trace visualization
  • Why needed: Critical for understanding request flows, debugging performance issues, and identifying dependencies

OpenTelemetry Collector (ports: 4317/4318)

  • Purpose: Unified telemetry data collection and processing
  • What it does:
  • Receives traces, metrics, and logs from applications
  • Processes, filters, and transforms telemetry data
  • Routes data to appropriate backends (Prometheus, Tempo, Loki)
  • Provides vendor-neutral instrumentation
  • Why needed: Standardizes telemetry collection and reduces vendor lock-in

Loki (port: 3100)

  • Purpose: Log aggregation and storage system
  • What it does:
  • Collects logs from applications and infrastructure
  • Stores logs efficiently with label-based indexing
  • Provides LogQL for querying logs
  • Integrates with Grafana for log visualization
  • Why needed: Centralized logging for debugging, audit trails, and correlation with metrics/traces

Promtail

  • Purpose: Log shipping agent for Loki
  • What it does:
  • Discovers and collects logs from various sources
  • Adds labels and metadata to log entries
  • Ships logs to Loki for storage
  • Why needed: Efficient log collection from Docker containers and system logs

Node Exporter (port: 9100)

  • Purpose: System and hardware metrics collection
  • What it does:
  • Exposes system metrics (CPU, memory, disk, network)
  • Provides hardware information
  • Monitors system health
  • Why needed: Infrastructure monitoring to understand resource utilization and system health

Data Flow

Applications
    โ†“ (OpenTelemetry SDK)
OpenTelemetry Collector
    โ†“
โ”œโ”€โ”€ Prometheus (metrics) โ”€โ”€โ”
โ”œโ”€โ”€ Tempo (traces) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ””โ”€โ”€ Loki (logs) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ†“                      โ†“
Grafana (unified visualization)

Quick Start

Prerequisites

  • Docker
  • Docker Compose

Launch

# Or using docker-compose directly
docker compose up -d

Access URLs

ServiceURLCredentialsPurpose
Grafanahttp://localhost:3001admin/adminDashboards and visualization
Prometheushttp://localhost:9091-Metrics and alerting
Tempohttp://localhost:3200-Distributed tracing
Node Exporterhttp://localhost:9100-Distributed tracing

Usage

Commands

# Lifecycle management
docker compose up -d           # Start stack
docker compose down            # Stop stack
docker compose restart         # Restart stack

# Monitoring
docker compose ps              # Check service status
docker compose logs            # View all logs
docker compose logs grafana
docker compose logs otel
docker compose logs prometheus
docker compose logs tempo
docker compose logs grafana

# Maintenance
docker compose down -v         # Complete cleanup (including volumes)

Testing with Sample Traces

To verify the observability stack is working correctly, you can generate sample traces:

# Useing Docker
docker build -f Dockerfile.trace-generator -t observability-trace-generator .
docker run --rm --network observability-stack_observability observability-trace-generator

# Using python
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc

# Run trace generator
python test-traces.py

What the test generates:

  • User request traces: Simulates API calls with authentication, database queries, and external API calls
  • Order processing traces: Complex workflows with validation, inventory checks, payments, and notifications
  • Error scenarios: Includes occasional failures to test error handling
  • Realistic timing: Variable response times to simulate real-world conditions

Viewing the traces:

  • Open Grafana: http://localhost:3001
  • Go to Explore โ†’ Tempo
  • Search for traces
  • Explore the trace timeline and spans

Configuration Files

  • config/prometheus.yml - Prometheus scraping configuration
  • config/tempo.yaml - Tempo tracing backend configuration
  • config/otel-collector-config.yaml - OpenTelemetry Collector pipeline configuration
  • config/grafana/provisioning/ - Grafana datasources and dashboards
  • config/promtail-config.yml - Promtail log collection configuration

Monitoring Targets

Infrastructure

  • Host Metrics: CPU, memory, disk usage, network traffic
  • Container Metrics: Resource usage per container
  • System Logs: Docker container logs, system events