Databricks Monitoring & Alerting Suite
Complete monitoring and observability setup with 8 SQL dashboards, alert definitions, webhook templates, and 50+ system table queries.
📁 File Structure 15 files
📖 Documentation Preview README excerpt
Databricks Monitoring & Alerting Suite
Version: 1.0.0
Author: [Datanest Digital](https://datanest.dev)
Price: $59
Category: Databricks
---
Overview
A comprehensive, production-ready monitoring and alerting toolkit for Databricks workspaces. This suite provides 8 SQL dashboards, configurable alert definitions, 50+ pre-built system table queries, webhook integrations, report templates, and operational runbooks -- everything you need to achieve full observability over your Databricks environment.
What's Included
Dashboards (8)
| Dashboard | Description |
|-----------|-------------|
| Pipeline Health | Success rates, failure trends, duration tracking across all pipelines |
| Cluster Utilization | CPU, memory, idle time, and cost attribution per cluster |
| Job Failure Analysis | Error categorization, root cause patterns, failure heatmaps |
| Cost Trends | Daily/weekly/monthly DBU spend broken down by team, workspace, SKU |
| User Activity | Active users, notebook execution patterns, query frequency analysis |
| Data Freshness | Table update timestamps vs SLA targets with breach detection |
| Query Performance | SQL warehouse query latency, throughput, and optimization signals |
| Capacity Planning | Growth projections, resource demand forecasting, headroom analysis |
Alerts
- Alert Definitions -- SQL-based alert rules for job failures, cost spikes, idle clusters, SLA breaches, and more
- Webhook Templates -- Ready-to-use payload templates for Slack, Microsoft Teams, PagerDuty, and email
Queries
- System Table Library -- 50+ pre-built queries against
system.billing,system.access,system.compute, and related tables covering billing analysis, access auditing, compute profiling, and operational diagnostics
Templates & Runbooks
- Monthly Health Report -- Markdown template with embedded query references for generating executive-level monthly reports
- Common Operations Runbook -- Step-by-step procedures for incident response, cost optimization, capacity management, and access review
Prerequisites
- Databricks workspace with Unity Catalog enabled
- Access to [system tables](https://docs.databricks.com/en/administration-guide/system-tables/index.html) (
system.billing,system.access,system.compute) - SQL warehouse (Serverless or Pro recommended)
- Databricks SQL Alerts & Dashboards feature enabled
Quick Start
1. Import Dashboards -- Open each .sql file in dashboards/ and create a new Databricks SQL dashboard with the queries
2. Configure Alerts -- Run alerts/alert_definitions.sql to create alert rules, then configure destinations using the webhook templates in alerts/webhook_templates.json
3. Explore Queries -- Use queries/system_table_library.sql as a reference library; copy individual queries into notebooks or dashboards as needed
4. Schedule Reports -- Adapt templates/monthly_health_report.md to your organization and schedule the underlying queries
5. Adopt Runbooks -- Customize runbooks/common_operations.md with your team's escalation paths and thresholds
File Structure
*... continues with setup instructions, usage examples, and more.*