Pipeline

AI-Native Pipelines - What Changes When Your Consumer Is an LLM, Not a Dashboard

TL;DR Data pipelines were optimised for human consumers - dashboards, BI tools, analysts. In 2026 a growing share of pipeline output flows directly to language models, agents, and retrieval systems. That changes the design constraints in ways that catch teams off guard. Aggregation matters less. Context fidelity matters more. Freshness behaves differently. Schema moves from rigid to negotiated. Cost shifts from compute to tokens. The biggest mistake is treating an LLM consumer as if it were just another dashboard. It is not. It does not skim, it does not interpret charts, it does not have working memory across rows. It needs to be fed. The new patterns - retrieval-aware partitioning, embedding pipelines, structured-document outputs, prompt-shaped views, evaluation harnesses for data quality - are the actual subject of “AI-native data engineering” in 2026. The Underlying Shift For thirty years the implicit consumer of every data pipeline was a human looking at a screen. Even when the pipeline ended in an API or a CSV, the conceptual end-user was someone who would interpret the output with judgement, context, and skim-reading. ...

Data Engineering & Data Science Courses

How to Use This Guide This curated list covers courses from beginner to advanced levels across multiple platforms. Choose based on: Your role: Data Engineer, Data Analyst, or Data Scientist Learning style: Self-paced courses, specializations, or nanodegrees Timeline: Single courses (weeks) vs. comprehensive programs (months) Hands-on practice: Most include projects and real-world scenarios Cloud platform: AWS, GCP, Azure, or multi-cloud approaches Data Engineering Professional Certificates (Industry-Backed) Best for: Structured learning with recognized credentials ...

AWS re:Invent Slides (2022)

This is the set of re:Invent 2022 slide decks I found most useful when they were published, grouped by topic. Each entry links to the official AWS-hosted PDF and carries a short, plain-language note about what the session is useful for in practice - so you can decide which decks are worth reading before committing the time. For the full session video recordings, see the AWS Events channel on YouTube. DevOps Amazon’s approach to high-availability deployment The practices Amazon’s own delivery teams use to reach near-zero deployment failure rates. Most of the value is in the guardrail patterns - pre-production gates, automated rollback triggers, and how to design a release process that protects itself from human error. ...

CI/CD Tools

CI/CD is the plumbing that turns a commit into running software. The tools below cover the full spectrum - from fully managed SaaS that you never have to operate, to self-hosted automation servers you tune yourself, to GitOps controllers that treat your Kubernetes cluster as the deployment target. How to choose A few questions that tend to cut through the vendor noise: Is your code in GitHub, GitLab, or Bitbucket? Staying in-ecosystem reduces integration effort dramatically Do you deploy to Kubernetes? GitOps tools like Argo CD and Flux are often a better fit than traditional pipelines Do you want to operate the control plane yourself? Jenkins gives you maximum flexibility and maximum operational burden How many parallel runners will you need at peak, and who pays for them? Hosted CI/CD platforms Low operational overhead, tight integration with their source-control parents, and usage-based pricing. ...