Agent-First Architecture Banner

Agent-First Architecture: The Engineer as System Curator

This is a “thinking out loud” post, not a report from the front lines. I have no evidence any of this is happening at scale, and it is not how my current day job looks. These are just ideas I keep turning over, and I wanted to write them down to see if they hold together. The question I keep coming back to is simple. If AI agents continue to improve at the rate they seem to be, what does engineering work look like five or ten years from now? Not tomorrow. Not next quarter. Further out, where the shape of the job might actually be different. ...

April 23, 2026 · 12 min · James M
Apache Iceberg in 2026

Apache Iceberg in 2026: The Open Table Format That Won

In 2023, the question was “which open table format will survive - Iceberg, Delta, or Hudi?” In 2026, that debate is over. Apache Iceberg won, and it won for reasons that have almost nothing to do with its raw performance. It won because it is the only format that both Snowflake and Databricks now treat as a first-class citizen, because the vendors picked sides on catalogs rather than table formats, and because enterprise buyers decided that multi-engine portability was worth more than a small performance edge. ...

April 22, 2026 · 11 min · James M
Following the Money in Data

Following the Money: Databricks vs Snowflake vs the Open-Source Alternative

The views in this post are my own personal reflections on the data industry, written in my own time. They are not about any specific employer, team, or colleague, past or present, and do not draw on any non-public information. In 2026, the technical gap between Databricks and Snowflake has narrowed to a sliver. Both offer world-class serverless compute, both support Iceberg/Delta as first-class citizens, and both have integrated AI agents that can write SQL better than your average intern. ...

April 8, 2026 · 4 min · James M
Modern Data Engineering on Databricks

Modern Data Engineering on Databricks (2026 Guide)

The 2026 Databricks Baseline Databricks in 2026 looks much more opinionated than it did just a few years ago. For most new data engineering work, the default stack is now clear: Unity Catalog for governance managed tables where possible serverless compute for notebooks, SQL, pipelines, and jobs Lakeflow Declarative Pipelines for batch and streaming data products liquid clustering instead of old-style partition design for many workloads That shift matters because the platform has moved beyond “bring your own clusters and tune everything manually.” The modern Databricks approach is increasingly declarative, governed, and automated. ...

April 6, 2026 · 7 min · James M
Data Engineering Blogs

Data Engineering Blogs

Modern Data Stack & Engineering Core Blogs & Publications Start Data Engineering - Practical guides, tutorials, and real-world projects for building scalable data platforms from scratch. Seattle Data Guy - Balance of business strategy and technical implementation in modern data engineering. Eclectic Data - Deep technical analysis of data infrastructure, distributed systems, and architectural patterns. Benn Stancil’s Blog - Strategic insights and industry commentary on analytics, data culture, and organizational challenges. Platform & Tool Blogs Airbyte Blog - Data integration, ELT approaches, and best practices for data movement at scale. Databricks Blog - Comprehensive coverage of Apache Spark, Delta Lake, and Lakehouse architectural patterns. LakeFS Blog - Data versioning, governance, and data lakes as code principles. dbt Blog - Analytics engineering workflows, SQL best practices, and modern data transformation. Apache Airflow Blog - Workflow orchestration patterns, DAG design, and production deployment strategies. Kafka Blog - Stream processing, real-time data architectures, and event-driven systems. Redpanda Blog - Kafka ecosystem evolution, streaming data pipelines, and cost optimization. Podcasts & Multimedia The Data Engineering Podcast - Interviews and deep dives into data tools, techniques, and industry practitioners. DataFramed Podcast - Conversations on data careers, best practices, and emerging technologies. Data Warehousing & Analytics Snowflake Blog - Cloud data warehouse innovations, performance optimization, and enterprise data strategies. Google Cloud Data Analytics Blog - BigQuery best practices, modern data stack integration, and Google Cloud data solutions. Restack Blog - Data infrastructure comparisons, architecture patterns, and cost optimization strategies. Communities & Learning Online Communities DataTalks.Club - Free community-driven courses, job board, and peer-to-peer learning for data professionals. r/dataengineering - Active community discussions, career advice, and industry insights. dbt Community - Slack workspace, forums, and networking for analytics engineers and data teams. Learning Resources Data Engineering Fundamentals - Comprehensive guide covering data architecture, ETL/ELT, and system design. Engineer Codehouse - Practical tutorials and guides for modern data stack technologies. Industry News & Trends The Data Stack News - Weekly roundup of news, funding announcements, and updates across the data ecosystem. KDnuggets - News, tutorials, and discussions on data science, machine learning, and data engineering. Data Engineering Weekly - Curated newsletter featuring tools, articles, and thought leadership in data engineering. The Pragmatic Engineer - Data - Engineering-led analysis with frequent data platform deep dives. Open Table Format & Lakehouse Apache Iceberg Blog - Official updates on the open table format increasingly central to the 2026 lakehouse. Tabular Blog - Deep technical writing on Iceberg internals and multi-engine lakehouse design. Dremio Blog - Query engines, Iceberg, and open data architecture. Onehouse Blog - Hudi and open lakehouse patterns. Transformation & Analytics Engineering dbt Developer Blog - Analytics engineering patterns and practical SQL modelling guidance. Tobiko / SQLMesh Blog - Next-generation transformation framework with virtual environments. Locally Optimistic - Long-form posts on analytics engineering culture and practice.

April 5, 2026 · 3 min · James M
Lakeflow Declarative Pipelines

Lakeflow Declarative Pipelines: From DLT to Production

If you’ve been writing Delta Live Tables (DLT) pipelines, you’ve been building with Lakeflow without knowing the new name. In 2026, the rebranding matters because it signals how Databricks now wants you to think about declarative pipeline design. This isn’t just a rename. The mental model has shifted from “tables and dependencies” to “data flows and transformations.” Let me show you what changed and why it matters. For where Lakeflow fits relative to other orchestration choices and the broader paradigm question, see The modern lakehouse stack and Stream vs batch processing. ...

April 5, 2026 · 9 min · James M
Databricks vs Snowflake

Databricks vs Snowflake in 2026: An Honest Comparison

The views in this post are my own personal reflections on the data industry, written in my own time. They are not about any specific employer, team, or colleague, past or present, and do not draw on any non-public information. The question “Databricks or Snowflake?” has dominated data engineering conversations for the past five years. In 2026, it’s still the wrong question. But let me answer it anyway, because sometimes you have to pick one. For the wider stack this choice sits inside, see The modern lakehouse stack. ...

April 5, 2026 · 11 min · James M
Data Engineering Courses

Data Engineering & Data Science Courses

How to Use This Guide This curated list covers courses from beginner to advanced levels across multiple platforms. Choose based on: Your role: Data Engineer, Data Analyst, or Data Scientist Learning style: Self-paced courses, specializations, or nanodegrees Timeline: Single courses (weeks) vs. comprehensive programs (months) Hands-on practice: Most include projects and real-world scenarios Cloud platform: AWS, GCP, Azure, or multi-cloud approaches Data Engineering Professional Certificates (Industry-Backed) Best for: Structured learning with recognized credentials ...

April 4, 2026 · 5 min · James M