Monitoring Features

Beyond basic up/down monitoring — health scores, tail latency detection, regional performance, and dependency-aware alerting.

Health Scores

Every monitor gets a composite health score from 0 to 100, computed from four weighted components:

Component	Weight	What It Measures
Uptime	40 points	30-day uptime percentage. 100% = 40 pts, 99.9% = 38, 99% = 32, <90% = 0
Response Time	30 points	Speed (average ms) + stability (coefficient of variation). Fast and consistent = max points
Incidents	20 points	30-day incident frequency. 0 incidents = 20, 1 = 16, 3+ = 12, 10+ = 0
SSL	10 points	Certificate health. Valid cert = 10, expired/invalid = 0. Non-SSL monitors get full points

Labels: Excellent (90+), Good (75+), Fair (50+), Poor (25+), Critical (<25)

The monitor detail page shows a 30-day trend line so you can see if a service is getting more or less reliable over time.

Percentile Alerting (p95/p99)

Average response time can hide problems. A service might average 200ms but have a 2-second p99 — meaning 1% of your users experience a 10x slower response.

Set p95 or p99 thresholds on any monitor:

p95 threshold — alert when 5% of requests exceed this response time
p99 threshold — alert when 1% of requests exceed this response time

Percentiles are computed from the last 100 checks using PostgreSQL's PERCENTILE_CONT. A 30-minute cooldown per monitor prevents alert spam.

Configure these in Monitor Detail → Settings → Response Time Alert Thresholds.

Per-Region Latency

When multi-region monitoring is active, the monitor detail page shows a Regional Performance section with per-region stats:

Average response time per region
p95 response time per region
Min/Max response times
Check count per region (24h)

Regions with response times more than 2x the overall average are flagged with a "Slow" badge, making geographic degradation immediately visible.

Monitor Dependencies

Set up parent-child relationships between monitors to prevent alert cascades. If your database goes down, you don't need separate alerts for every service that depends on it.

When a parent monitor is down, alerts for child monitors are automatically suppressed. The dependency graph at Dashboard → Dependencies visualizes these relationships with color-coded nodes and suppression edges.

Configure dependencies in Monitor Detail → Settings → Depends On.

Monitor Tags

Organize monitors with free-form tags like production, api, database, staging, us-east. Tags enable multi-dimensional filtering on the dashboard — click any tag to filter, combine multiple tags with AND logic.

Add tags when creating a monitor or from the monitor detail page. Tags are lowercase and deduplicated automatically.

Runbooks

Attach response instructions to any monitor. When an alert fires, the runbook content is included in the notification — Slack, Discord, Telegram, PagerDuty, OpsGenie, and webhook payloads all include it.

Write your runbook in the Monitor Detail → Settings → Runbook textarea. Keep it concise — content is truncated to 2KB in notifications to respect channel limits.

Custom Alert Templates

Customize what your alert messages say per notification channel. Use variables to include dynamic data:

[{{status}}] {{monitor_name}} — {{message}} | Response: {{response_ms}}ms

Available variables: {{monitor_name}}, {{monitor_url}}, {{status}}, {{message}}, {{response_ms}}, {{http_status}}, {{timestamp}}, {{runbook}}

Configure templates in Settings → Alert Channels when creating or editing a channel.

SLA Error Budget Tracking

Set an SLA target per monitor (e.g., 99.9%) and track your error budget in real time. The monitor detail page shows a gauge with:

Allowed downtime — total minutes of downtime your SLA permits this period
Used downtime — minutes consumed so far
Remaining budget — minutes you can still afford before breaching
Status — On Track (<70%), At Risk (70-90%), Critical (90-100%), Breached (>100%)

Configure SLA targets in Monitor Detail → Settings. Supports monthly and quarterly periods.

Multi-Environment Views

Filter your dashboard by environment using env: tags. Tag monitors with env:production, env:staging, env:canary, etc. and an environment toggle appears at the top of the dashboard.

Selection persists across sessions. All monitors, groups, and tags are filtered by the selected environment.

NOC Dashboard / TV Mode

Full-screen status display at /dashboard/tv — designed for wall-mounted screens in operations centers. Shows all monitors in an auto-scaling grid with real-time updates. No sidebar, no navigation, just status.