Ethical Data Use (EDU) in 2026 - What Data Engineers Actually Need to Get Right Banner

Ethical Data Use (EDU) in 2026: What Data Engineers Actually Need to Get Right

For most of the last decade, “ethical data use” was something that happened in a different building. The lawyers wrote the privacy policy, the data protection officer ran the impact assessment, and the engineers built whatever the ticket said. The ethics lived in a PDF, and the pipeline lived in the warehouse, and the two rarely met. In 2026 that separation has quietly collapsed. The reason is not that engineers suddenly became more principled - it is that the decisions which determine whether data is used ethically are now made at the schema, the table, and the access-control layer, and those are the engineer’s decisions. Consent, deletion, minimisation, provenance, bias: every one of them is now something you either build into the pipeline or fail to. This is a practical look at what that means. ...

June 4, 2026 · 17 min · James M
Catalog Layer Battleground Banner

The Catalog Layer Is the New Battleground - Unity, Polaris, Gravitino, Nessie

TL;DR With the open table format wars largely settled, the strategic fight in 2026 has moved up to the catalog layer - the system that manages tables, namespaces, governance, and access. Four credible open or open-ish catalogs are now in serious play: Unity Catalog (Databricks), Polaris (Snowflake), Apache Gravitino (Datastrato/community), and Project Nessie (Dremio/community). All four implement the Iceberg REST catalog spec to varying degrees, which means clients can talk to them through a common protocol. The differentiation has moved to governance, multi-tenancy, lineage, federation, and developer experience. Unity is the most production-mature and the most coupled to Databricks. Polaris is the cleanest open implementation of the REST spec. Gravitino is the most ambitious in scope - aiming to catalog non-table assets too. Nessie is the most opinionated about Git-style branching for data. The winning catalog will probably not be a single project. It will be the protocol (Iceberg REST) plus multiple compliant implementations plus federation between them. That is the picture 2026 ends with. Why The Catalog Layer Matters Now A table format defines how data is laid out on disk. A catalog defines: ...

May 2, 2026 · 8 min · James M
Unity Catalog in Practice

Unity Catalog in Practice: Lessons From the Field

The views in this post are my own personal reflections on industry patterns, written in my own time. They are not about any specific employer, team, or colleague, past or present, and do not draw on any non-public information. Unity Catalog sounds straightforward: “one governance layer for all your data and AI assets.” In theory, it’s elegant. In practice, you’ll run into gotchas that docs don’t prepare you for. This post collects generic patterns that come up repeatedly in public talks, vendor docs, community write-ups, and open discussions of UC adoption in 2026. For where Unity sits in the broader picture of catalogs, table formats, and engines, see The modern lakehouse stack. ...

April 3, 2026 · 10 min · James M