Databricks operates a unified Data Intelligence Platform that processes and analyzes data for over 15,000 organizations globally, including more than 60% of the Fortune 500. The platform combines data engineering, ETL pipelines, ML model training, and generative AI infrastructure - running on AWS, Azure, and GCP. Founded in 2013 by the engineers who created Apache Spark at UC Berkeley, the company built its architecture around lakehouse design, which merges data warehouse and data lake capabilities into a single system that handles both structured analytics and unstructured ML workloads.
The security surface spans distributed data processing at enterprise scale, multi-cloud deployments, and data governance across heterogeneous environments. Unity Catalog provides centralized access control and audit logging for lakehouse assets, while the platform must secure data pipelines that move between cloud storage, compute clusters, and external integrations. Databricks maintains three major open-source projects - Delta Lake for transactional storage, MLflow for ML lifecycle management, and Unity Catalog for governance - each introducing distinct attack vectors and compliance requirements that security teams need to monitor across production deployments.
The threat model includes data exfiltration risks during ETL operations, privilege escalation in shared compute environments, and supply chain vulnerabilities in open-source dependencies. Teams work with Python, Scala, and Java codebases, securing Spark clusters that process sensitive data alongside ML models that require both training data protection and inference endpoint hardening. Security engineering at this scale means defending distributed systems where data moves constantly between storage layers, compute resources, and API endpoints across multiple cloud providers.