Devops

The eBPF Revolution - What Every Platform Engineer Should Know

TL;DR eBPF is the technology that lets you run safe, sandboxed programs inside the Linux kernel without writing kernel modules. In 2026 it is the foundation under most serious observability, networking, and runtime security tools. The interesting story is not the technology itself - it is the wave of products built on top of it: Cilium for networking, Tetragon for runtime security, Pixie, Parca, and Coroot for observability, plus a long tail of vendor offerings using eBPF under the hood. For platform engineers, eBPF is not “a thing you have to learn to write.” It is a thing you have to know about so you can choose tools intelligently and understand what is happening on your nodes when those tools cause problems. The most important shift eBPF has enabled is observability without instrumentation. You can see what is happening on a system without modifying the application, without restarting it, and with low overhead. That is genuinely new. What eBPF Actually Is eBPF stands for “extended Berkeley Packet Filter,” which is historical and confusing because eBPF has long since outgrown packet filtering. The simple version: ...

Kubernetes in 2026 Complexity Tax Banner

Kubernetes in 2026 - Is It Still Worth the Complexity Tax?

TL;DR Kubernetes won the orchestration argument years ago. The question is no longer “should we use Kubernetes.” It is “should this particular team, with this particular workload, with this particular budget, pay the operational tax.” For genuinely large, multi-tenant, multi-region platforms with dedicated infrastructure teams, the answer is still mostly yes. The ecosystem maturity is unmatched and the alternatives lose at scale. For mid-sized engineering organisations, the answer in 2026 is probably not, and increasingly not. Managed serverless, container platforms like Fly and Railway, and the new generation of platform-as-a-service offerings are competitive in ways they were not three years ago. For startups and small teams, the answer is almost always no, and stop pretending otherwise. The honest read in 2026: Kubernetes is the right answer to fewer questions than it used to be, and being honest about that is now a competitive advantage rather than a heresy. How We Got Here Kubernetes was the right idea at the right time. By the late 2010s, every serious engineering team needed an answer to “how do we run containers in production.” Kubernetes provided one, it was open, it was backed by a credible foundation, and the cloud providers all blessed it. Within five years it was the default. Within ten years it was the assumption. ...

Self-Hosted vs Managed in 2026 - The Cost Math Has Changed Again

TL;DR The self-hosted vs managed decision in 2026 is genuinely different from the same decision in 2022. The math has shifted in three directions: cloud egress costs, AI workload economics, and self-hosted tooling maturity. Managed remains the right default for most teams. The thing that has changed is that the threshold at which self-hosting becomes worth considering has dropped. Workloads that were obviously managed in 2022 are genuine 50/50 calls in 2026. The most important shift is that self-hosting is no longer synonymous with on-premises. Modern self-hosting often means renting bare-metal in a colocation, running your own clusters in a hyperscaler, or using sovereign cloud providers - all with different economics. For specific categories - AI inference at scale, data egress-heavy workloads, predictable steady-state compute, regulated environments - self-hosting now wins on cost more often than people assume. The honest framing: managed is the right default; self-hosting is the right minority case; the minority is bigger than it used to be. Why This Decision Got Harder For most of the 2010s the answer was easy. Managed services were cheaper than self-hosting once you priced in operational overhead. The cloud providers competed aggressively. Self-hosting was for the regulated, the eccentric, and the very large. ...

Platform Engineering in 2026: What It Is and Why DevOps Teams Are Adopting It

TL;DR Platform engineering - building an internal developer platform (IDP) of golden paths, self-service environments, a developer portal, policy as code, and paved-road CI/CD - is the default shape of infrastructure teams larger than a dozen people in 2026 Four forces drove the convergence: cognitive load (the cloud-native stack is too big for one head), the DORA evidence linking platforms to elite performance, the regulatory ratchet, and AI agents AI agents made 2026 the tipping point: an agent that can open PRs and apply Terraform changes is only safe inside a platform that enforces policy checks, cost caps, and blast-radius limits Platform engineering is not a rebrand of DevOps - the platform team is a product team whose customers are other engineers If you have no platform yet, start with the single most-painful golden path, not a portal Platform engineering used to be the title on a few job adverts at Spotify and Netflix. In 2026 it is the default shape of any infrastructure team larger than a dozen people. The shift is worth understanding, because it is not just a rebrand of DevOps - it is a different operating model, with different tools, different incentives, and a different relationship to the developers it serves. ...

AWS S3 Files - Bridging File Systems and Object Storage

Amazon Web Services recently introduced AWS S3 Files, a service that addresses a persistent challenge in cloud computing - how to give file-based applications direct access to object storage without duplicating data or building custom connectors. The Problem S3 Files Solves Traditionally, applications designed around file systems faced a difficult choice when working with Amazon S3: Use object APIs - Build custom integration code and refactor applications Duplicate data - Copy data between S3 and separate file systems, creating sync challenges and increased costs Accept performance trade-offs - Work with slower, network-dependent access patterns S3 Files eliminates these constraints by providing a native file system interface directly over S3 data. ...

DevOps in the Age of AI Agents

For years, DevOps has been about breaking down silos and automating the software delivery lifecycle. We moved from manual deployments to Jenkins scripts, then to YAML-defined pipelines, and eventually to Infrastructure as Code (IaC). But in 2026, the bottleneck is no longer the speed of the pipeline - it’s the speed of human decision-making within that pipeline. We are entering the era of Agentic DevOps. From Automation to Autonomy Traditional DevOps automation follows a strict “if this, then that” logic. AI-driven DevOps uses reasoning models to handle the “I’m not sure, let me figure it out” scenarios that typically stall a release. ...

Where Should Documentation Actually Live? Thinking Out Loud in the AI Era

TL;DR Documentation sprawl across Confluence, Jira, SharePoint, Google Docs, GitHub, and Miro is not a tool problem - it is a joints problem: the same decision exists in four places, drifting out of sync immediately Three forces constantly pull against each other: source of truth (one canonical home), discoverability (right surface for every audience), and governance (real access control) - optimising for any one breaks the others The proposed shape: docs-as-code for engineering artefacts in Git, collaborative tools for business content, a read-only render layer between them, and an AI-assisted discovery layer across all of it AI tooling weakens the old boundary - a business user can get a summary generated from a markdown master without ever seeing the file, and an engineer can draft an ADR pulling context from Confluence and Jira automatically Several genuine open questions remain unsolved: versioning across boundaries, who owns the render pipeline, and whether Jira tickets as documents should be formalised or fought against This post is me thinking out loud. It is not a proposal, not a recommended pattern, and possibly not even a useful framing. I am writing it because I am actively stuck on the question, and writing in public tends to be the fastest way I find out what I have got wrong. Feel free to disagree with any of it. ...

Understanding Types of Cyber Attacks Banner

Understanding Types of Cyber Attacks: A DevOps Guide

Cyber attacks are becoming increasingly sophisticated, and DevOps teams must understand the landscape to build resilient systems. This guide covers the most common attack types and practical defense strategies. Social Engineering Attacks Phishing remains one of the most effective attack vectors. Attackers craft deceptive emails or messages to trick users into revealing sensitive information or clicking malicious links. The 2015 Ukraine power grid attack, for example, relied on phishing emails to harvest login credentials before the actual infrastructure attack. ...

DevOps Best Practices

The views in this post are my own personal reflections on the industry, written in my own time. They are not about any specific employer, team, or colleague, past or present, and do not draw on any non-public information. “Best practice” is a phrase that should be treated with suspicion. What works for a fintech running 500 engineers rarely works for a five-person startup. The notes below are generic patterns drawn from public talks, books, and industry write-ups - always weighed against context, team size, and what the system is actually trying to do. ...

DevOps Cheatsheets

Cheatsheets are one of the most under-rated learning tools in the DevOps toolbox. When you are three hours into debugging a broken pipeline, you don’t want a 400-page book - you want the one page that reminds you which flag does what. This page collects quick references I keep within arm’s reach. Cloud Computing A concise summary of the core cloud service models (IaaS, PaaS, SaaS), deployment patterns, and the shared responsibility model is a good starting point for anyone new to cloud infrastructure. ...