Apache Iceberg in 2026

Apache Iceberg in 2026: The Open Table Format That Won

In 2023, the question was “which open table format will survive - Iceberg, Delta, or Hudi?” In 2026, that debate is over. Apache Iceberg won, and it won for reasons that have almost nothing to do with its raw performance. It won because it is the only format that both Snowflake and Databricks now treat as a first-class citizen, because the vendors picked sides on catalogs rather than table formats, and because enterprise buyers decided that multi-engine portability was worth more than a small performance edge. ...

April 22, 2026 · 11 min · James M
Following the Money in Data

Following the Money: Databricks vs Snowflake vs the Open-Source Alternative

In 2026, the technical gap between Databricks and Snowflake has narrowed to a sliver. Both offer world-class serverless compute, both support Iceberg/Delta as first-class citizens, and both have integrated AI agents that can write SQL better than your average intern. If you want to understand which one is right for your organization, you stop looking at the feature list. You start following the money. The Economic Moat: Lock-in as a Service For a long time, the narrative was simple: Snowflake was the “Easy” button (Data Warehouse) and Databricks was the “Power” button (Data Lake). ...

April 8, 2026 · 4 min · James M
Modern Data Engineering on Databricks

Modern Data Engineering on Databricks (2026 Guide)

The 2026 Databricks Baseline Databricks in 2026 looks much more opinionated than it did just a few years ago. For most new data engineering work, the default stack is now clear: Unity Catalog for governance managed tables where possible serverless compute for notebooks, SQL, pipelines, and jobs Lakeflow Declarative Pipelines for batch and streaming data products liquid clustering instead of old-style partition design for many workloads That shift matters because the platform has moved beyond “bring your own clusters and tune everything manually.” The modern Databricks approach is increasingly declarative, governed, and automated. ...

April 6, 2026 · 7 min · James M
Data Engineering Blogs

Data Engineering Blogs

Modern Data Stack & Engineering Core Blogs & Publications Start Data Engineering - Practical guides, tutorials, and real-world projects for building scalable data platforms from scratch. Seattle Data Guy - Balance of business strategy and technical implementation in modern data engineering. Eclectic Data - Deep technical analysis of data infrastructure, distributed systems, and architectural patterns. Benn Stancil’s Blog - Strategic insights and industry commentary on analytics, data culture, and organizational challenges. Platform & Tool Blogs Airbyte Blog - Data integration, ELT approaches, and best practices for data movement at scale. Databricks Blog - Comprehensive coverage of Apache Spark, Delta Lake, and Lakehouse architectural patterns. LakeFS Blog - Data versioning, governance, and data lakes as code principles. dbt Blog - Analytics engineering workflows, SQL best practices, and modern data transformation. Apache Airflow Blog - Workflow orchestration patterns, DAG design, and production deployment strategies. Kafka Blog - Stream processing, real-time data architectures, and event-driven systems. Redpanda Blog - Kafka ecosystem evolution, streaming data pipelines, and cost optimization. Podcasts & Multimedia The Data Engineering Podcast - Interviews and deep dives into data tools, techniques, and industry practitioners. DataFramed Podcast - Conversations on data careers, best practices, and emerging technologies. Data Warehousing & Analytics Snowflake Blog - Cloud data warehouse innovations, performance optimization, and enterprise data strategies. Google Cloud Data Analytics Blog - BigQuery best practices, modern data stack integration, and Google Cloud data solutions. Restack Blog - Data infrastructure comparisons, architecture patterns, and cost optimization strategies. Communities & Learning Online Communities DataTalks.Club - Free community-driven courses, job board, and peer-to-peer learning for data professionals. r/dataengineering - Active community discussions, career advice, and industry insights. dbt Community - Slack workspace, forums, and networking for analytics engineers and data teams. Learning Resources Data Engineering Fundamentals - Comprehensive guide covering data architecture, ETL/ELT, and system design. Engineer Codehouse - Practical tutorials and guides for modern data stack technologies. Industry News & Trends The Data Stack News - Weekly roundup of news, funding announcements, and updates across the data ecosystem. KDnuggets - News, tutorials, and discussions on data science, machine learning, and data engineering. Data Engineering Weekly - Curated newsletter featuring tools, articles, and thought leadership in data engineering. The Pragmatic Engineer - Data - Engineering-led analysis with frequent data platform deep dives. Open Table Format & Lakehouse Apache Iceberg Blog - Official updates on the open table format increasingly central to the 2026 lakehouse. Tabular Blog - Deep technical writing on Iceberg internals and multi-engine lakehouse design. Dremio Blog - Query engines, Iceberg, and open data architecture. Onehouse Blog - Hudi and open lakehouse patterns. Transformation & Analytics Engineering dbt Developer Blog - Analytics engineering patterns and practical SQL modelling guidance. Tobiko / SQLMesh Blog - Next-generation transformation framework with virtual environments. Locally Optimistic - Long-form posts on analytics engineering culture and practice.

April 5, 2026 · 3 min · James M
Lakeflow Declarative Pipelines

Lakeflow Declarative Pipelines: From DLT to Production

If you’ve been writing Delta Live Tables (DLT) pipelines, you’ve been building with Lakeflow without knowing the new name. In 2026, the rebranding matters because it signals how Databricks now wants you to think about declarative pipeline design. This isn’t just a rename. The mental model has shifted from “tables and dependencies” to “data flows and transformations.” Let me show you what changed and why it matters. What Lakeflow Actually Is Lakeflow Declarative Pipelines is the modern Databricks way to say: “I describe what data I want, and Databricks manages how to get it.” ...

April 5, 2026 · 9 min · James M
Databricks vs Snowflake

Databricks vs Snowflake in 2026: An Honest Comparison

The question “Databricks or Snowflake?” has dominated data engineering conversations for the past five years. In 2026, it’s still the wrong question. But let me answer it anyway, because sometimes you have to pick one. The Honest Framing By 2026, both platforms have converged in surprising ways: Databricks started as a Spark compute engine and added warehouse features Snowflake started as a cloud data warehouse and added Iceberg support for lakehouse semantics Both now claim to be “lakehouses” that combine data lake flexibility with warehouse performance The difference isn’t in capability - it’s in architectural DNA, operational model, and what they expect you to optimize for. ...

April 5, 2026 · 11 min · James M
Data Engineering Courses

Data Engineering & Data Science Courses

How to Use This Guide This curated list covers courses from beginner to advanced levels across multiple platforms. Choose based on: Your role: Data Engineer, Data Analyst, or Data Scientist Learning style: Self-paced courses, specializations, or nanodegrees Timeline: Single courses (weeks) vs. comprehensive programs (months) Hands-on practice: Most include projects and real-world scenarios Cloud platform: AWS, GCP, Azure, or multi-cloud approaches Data Engineering Professional Certificates (Industry-Backed) Best for: Structured learning with recognized credentials ...

April 4, 2026 · 5 min · James M