2026 | jamesm.blog

Ethical Data Use (EDU) in 2026: What Data Engineers Actually Need to Get Right

For most of the last decade, “ethical data use” was something that happened in a different building. The lawyers wrote the privacy policy, the data protection officer ran the impact assessment, and the engineers built whatever the ticket said. The ethics lived in a PDF, and the pipeline lived in the warehouse, and the two rarely met. In 2026 that separation has quietly collapsed. The reason is not that engineers suddenly became more principled - it is that the decisions which determine whether data is used ethically are now made at the schema, the table, and the access-control layer, and those are the engineer’s decisions. Consent, deletion, minimisation, provenance, bias: every one of them is now something you either build into the pipeline or fail to. This is a practical look at what that means. ...

Dario Amodei: The Anthropic CEO Betting on Safety as Strategy

Dario Amodei is one of the few frontier-lab CEOs whose public talking points have not changed materially in five years. The same message he gave to small audiences in 2021 - that powerful AI is coming faster than people think, that the safety problem is real, and that the companies building it have an obligation to do so carefully - is the message he is giving to Congress and Davos in 2026. The thing that has changed is that he now runs the company most aggressively turning that message into a commercial position. ...

AI Energy Crisis - Why Data Center Power Will Define the Next Decade Banner

The AI Energy Crisis: Why Data Center Power Will Define the Next Decade

For most of the AI conversation in 2024 and 2025, the binding constraints on the build-out were chips and capital. By 2026 the conversation has shifted, and the constraint that gets discussed most seriously inside the hyperscalers is electricity. Not the cost of electricity. The actual physical availability of electrons - at gigawatt scale, in the places where the data centres need to be, on the schedule the model labs need them to be. The story does not have a single villain or a single number, but it has a shape, and the shape is becoming the story of the second half of the decade. ...

Inference Hardware Insurgents - Cerebras, Groq, SambaNova Banner

Cerebras, Groq, SambaNova: The Inference Hardware Insurgents

For most of the last decade, talking about AI hardware meant talking about Nvidia. In 2026 that has stopped being true at the inference layer. Three companies - Cerebras, Groq, and SambaNova - have built genuinely different chips around the same insight: that the workload economics of running models in production are not the same as the workload economics of training them, and that the chip architecture should follow the workload. The bet has been right enough that Nvidia has now licensed pieces of it. ...

Reasoning Models in 2026: o3, R2, and the Compute-at-Inference Shift

Two years ago the way to make a model better was to train a bigger one. By the start of 2026 that recipe has stopped being the most interesting answer. The frontier has moved to a different lever - letting the model think for longer at inference time, generating intermediate reasoning, and only then producing the final answer. The category has a name now (reasoning models) and a family of products built around it. The interesting questions are no longer whether the trick works, because it clearly does, but when to reach for one, where it lands in production, and what the costs actually look like once the demo glow wears off. ...

The State of Blockchain in 2026

TL;DR The blockchain industry in 2026 is no longer arguing about whether it has a future. The arguments are about which layers do which jobs. Bitcoin remains the reserve asset and the most credible neutral settlement layer. Ethereum is the dominant smart-contract base layer, with most activity now happening on its Layer 2s. Solana has taken the high-throughput application crown. Polkadot is mid-pivot from infrastructure to applications. The two structural shifts that define 2026 are modular blockchains (Celestia, EigenLayer) and the stablecoin economy, where annual settlement volume now exceeds Visa. Real-world asset tokenization has gone from a slide-deck thesis to a $30B+ live market, led by BlackRock’s BUIDL and tokenized US treasuries. The destination for the next two years is clear: payments, treasuries, and AI agents using crypto rails - and most users will not know they are using a blockchain. What Actually Survived It is worth saying out loud: most of the things that called themselves “the future of finance” in 2021 are gone. The 2022-2023 unwind cleared out the projects that had no users, no revenue, and no reason to exist. What remains in 2026 is a much smaller, much more boring, and much more useful set of networks. ...

China's Space Programme in 2026 - Tiangong, Chang'e, Lunar Plans

TL;DR China’s space programme in 2026 is one of the most consistently executed national space efforts in history. Where Western programmes have lurched between budgets and political cycles, China’s CNSA has shipped roughly what it announced, on roughly the timelines it announced. The Tiangong space station is fully operational, continuously crewed, and has hosted both domestic and international experiments. The Chang’e lunar series has progressed from sample return (Chang’e 5, 6) to the precursors of a crewed lunar landing programme planned before 2030. China has now returned samples from both the near and far sides of the Moon - the only nation to have done so. The lunar plan centres on the International Lunar Research Station (ILRS) - a long-term, China-led, multinational lunar surface base, with crewed landings as a milestone rather than the goal. Mars sample return, deep-space exploration, and a permanent lunar presence are all on a credible timeline. The realistic 2030 picture is two distinct, durable lunar architectures - American and Chinese - running in parallel. Why It Is Worth Looking Carefully It is easy in Western coverage to treat China’s space programme as a backdrop to the Artemis story. That undersells what is actually happening. ...

The eBPF Revolution - What Every Platform Engineer Should Know

TL;DR eBPF is the technology that lets you run safe, sandboxed programs inside the Linux kernel without writing kernel modules. In 2026 it is the foundation under most serious observability, networking, and runtime security tools. The interesting story is not the technology itself - it is the wave of products built on top of it: Cilium for networking, Tetragon for runtime security, Pixie, Parca, and Coroot for observability, plus a long tail of vendor offerings using eBPF under the hood. For platform engineers, eBPF is not “a thing you have to learn to write.” It is a thing you have to know about so you can choose tools intelligently and understand what is happening on your nodes when those tools cause problems. The most important shift eBPF has enabled is observability without instrumentation. You can see what is happening on a system without modifying the application, without restarting it, and with low overhead. That is genuinely new. What eBPF Actually Is eBPF stands for “extended Berkeley Packet Filter,” which is historical and confusing because eBPF has long since outgrown packet filtering. The simple version: ...

AI-Native Pipelines - What Changes When Your Consumer Is an LLM, Not a Dashboard

TL;DR Data pipelines were optimised for human consumers - dashboards, BI tools, analysts. In 2026 a growing share of pipeline output flows directly to language models, agents, and retrieval systems. That changes the design constraints in ways that catch teams off guard. Aggregation matters less. Context fidelity matters more. Freshness behaves differently. Schema moves from rigid to negotiated. Cost shifts from compute to tokens. The biggest mistake is treating an LLM consumer as if it were just another dashboard. It is not. It does not skim, it does not interpret charts, it does not have working memory across rows. It needs to be fed. The new patterns - retrieval-aware partitioning, embedding pipelines, structured-document outputs, prompt-shaped views, evaluation harnesses for data quality - are the actual subject of “AI-native data engineering” in 2026. The Underlying Shift For thirty years the implicit consumer of every data pipeline was a human looking at a screen. Even when the pipeline ended in an API or a CSV, the conceptual end-user was someone who would interpret the output with judgement, context, and skim-reading. ...

Iceberg vs Delta vs Hudi in 2026 - The Format Wars Are Over

TL;DR The open table format war between Apache Iceberg, Delta Lake, and Apache Hudi is effectively over in 2026 - and the outcome is not a single winner but a clear settlement. Iceberg has won the role of the neutral standard that engines and platforms expect to read and write. It is the format you choose when you do not want to be coupled to a single vendor. Delta has won the role of the incumbent default inside the Databricks ecosystem and remains a strong choice if Databricks is your primary engine. Delta UniForm has narrowed the gap by letting Delta tables expose Iceberg metadata. Hudi has not won a category outright. It retains a smaller but loyal user base for streaming-heavy and CDC-heavy workloads, where its design choices still genuinely fit. The interesting battle has moved up the stack to the catalog layer. The format question is mostly settled. The catalog question is the new fight. The Format Wars - A Brief History For most of the early 2020s the lakehouse story was a three-way argument about how to put ACID transactions on top of object storage. ...