Tags

, , , , , ,


Get that Databricks Out of My Face!

You need not loathe Databricks outright. It is perfectly defensible if you do. This is particularly true when your principal objective is classical data warehousing. This includes structured BI reporting, dependable SQL analytics, and a governed single source of truth for business metrics. It also entails semantic clarity and predictable costs for read-heavy workloads.

There are solid, well-trodden reasons for that caveat. Many experienced data warehousing practitioners see Databricks as an awkward or even risky primary platform for traditional warehousing. This view is shared by some of the field’s foundational figures. The concern is not ideological. It is architectural. When used as a warehouse, Databricks often reproduces exactly the pathologies criticised in enterprise data programs. These include unnecessary complexity, misdirected effort, and the perennial executive question, “Which number should I trust?”

There are principal reasons for avoiding relying on Databricks as the backbone of a conventional data warehouse. This consideration looks ahead to 2026.

A lakehouse is not a warehouse

Databricks presents itself as a “unified analytics platform that can behave like a data warehouse.” This formulation deserves careful scrutiny. Traditional warehouses include Teradata in its day. More recent examples are Snowflake, BigQuery, and Redshift. These are designed around schema-on-write, strong governance, and high-concurrency SQL workloads for business users. Their purpose is deliberately narrow and, by design, dull.

Databricks, by contrast, emerges from the Spark and data-lake tradition. Its strengths lie in flexibility: schema-on-read, heterogeneous workloads, machine learning, streaming, and unstructured data. These are powerful capabilities, but they carry well-understood risks. Governance must be relentless to avoid data swamps. Semantic consistency is rarely “out of the box.” Teams often find themselves constructing a warehouse-like abstraction atop lake-like foundations. The result can resemble a carefully curated museum of data models that few people actually use.

Gaps in the unglamorous essentials

Despite rapid progress, Databricks SQL still trails mature warehouses on several of the unexciting but mission-critical features enterprises rely upon. Multi-statement ACID transactions remain limited. Stored procedures and certain UDF patterns are less robust. Operational conveniences such as cloning, replication, and transactional orchestration often require workarounds. Under heavy, mixed BI workloads, performance and concurrency can behave less predictably than marketing material suggests.

None of this is catastrophic. But it matters when the workload consists of hundreds of analysts running routine queries all day, every day. In such settings, platforms like Snowflake tend to “just work”, which is precisely the point.

Cost predictability is not a footnote

Databricks’ compute-centric pricing model, DBUs attached to clusters or serverless endpoints, fits elastic engineering workloads well. It is less congenial to steady, always-on BI usage. Many teams discover that costs fluctuate sharply around reporting cycles, or that clusters are over-provisioned to avoid performance complaints. When the primary goal is to answer a small number of critical questions reliably, the overhead of a lakehouse can feel indulgent rather than enabling.

Complexity as an organisational tax

Running Databricks well demands skills beyond SQL. These skills include understanding Spark internals and notebook-based workflows. It also involves making architectural choices such as medallion layers and careful configuration of Unity Catalog. For data engineers this is stimulating work. For business analysts, it can widen the distance between question and answer. The risk is a return to the cathedral-building era of enterprise data: impressive structures, limited congregation.

Strategic alignment matters

If an organisation values stability and predictability in its core warehouse, Databricks is rarely the most conservative choice. Its centre of gravity is clearly shifting toward AI, machine learning, and unified analytics. That is a virtue when those are strategic priorities; it is a distraction when they are not. Bill Inmon and others have long stated that describing a system as “warehouse-like” does not provide the negotiated meaning. It lacks the governance and trust that true warehousing requires.

When Databricks makes sense

None of this is to deny Databricks’ strengths. Databricks can be an excellent choice where workloads genuinely combine warehousing with machine learning. It is also beneficial for streaming data, unstructured sources, or a tightly unified engineering-analytics team. Open formats such as Delta Lake are also attractive where lock-in is a concern.

The difficulty arises when these capabilities are treated as warehousing requirements rather than adjacent ambitions.

A sober conclusion

If your priority is classic data warehousing and fast trust, you should focus on semantic alignment and low-drama BI. Avoid the dysfunctions that warehousing theory has critiqued for decades. In such cases, scepticism toward Databricks is not reactionary. It is rational. Dedicated platforms such as Snowflake maintain strong performance in 2025–26 benchmarks. They excel in simplicity, concurrency, and cost predictability within pure analytics.

Databricks is a powerful and interesting data and AI platform. But forcing it into the role of a traditional warehouse risks recreating the confusion and misalignment it promises to transcend. Choose platforms according to the work you actually need to do, not the breadth of the vendor’s narrative.


Discover more from GOOD STRATEGY

Subscribe to get the latest posts sent to your email.