Context Engineering: The 2025 Skill That Replaced Prompt Engineering

Continuing from last week's post on Spec-Driven Development, this week I'll explain the skill that makes all planning frameworks truly powerful: Context Engineering. My colleagues have been asking me why some AI interactions produce brilliant results while others fall flat—even with similar prompts. The answer isn't better words. It's better context. This is part of my weekly AI series where I take you progressively through AI fundamentals, and context engineering represents a critical evolution. MIT Technology Review captured it perfectly in their November 2025 headline: "From vibe coding to context engineering: 2025 in software development." ## So What Is Context Engineering? You may wonder, what exactly is context engineering, or how is it different from prompt engineering? That is because while prompt engineering focuses on "what words should I use?", context engineering tackles a fundamentally bigger question: "what configuration of context is most likely to generate the behavior I need?" Here's how Andrej Karpathy defines it: "Context engineering is the delicate art and science of filling the context window with just the right information for the next step." He breaks it down into components: task descriptions, few-shot examples, RAG, multimodal data, tools, state and history, and compacting. Too little context and the LLM lacks what it needs; too much and costs go up while performance may decrease. And here's the insight that changed how I think about AI, from Anthropic's engineering team: "Claude is already smart enough—intelligence is not the bottleneck, context is." Let that sink in. Every organization has unique workflows, standards, and knowledge systems that AI doesn't inherently know. Context engineering bridges that gap. ## Why Context Engineering Matters in 2025 The data is compelling. According to LangChain's 2025 State of AI Agents report (1,340 respondents surveyed November-December 2025): - **57.3%** of organizations now have AI agents in production - **32%** cite quality as the top barrier to production deployment - For organizations with 10,000+ employees, **hallucinations and consistency** are the biggest challenges - Most critically: **most failures are traced to poor context management—not LLM capabilities** As one enterprise survey respondent noted: ongoing difficulties with "context engineering and managing context at scale" remain the primary obstacle. Meanwhile, McKinsey reports that enterprise AI spending hit **$37 billion in 2025**—up from $11.5 billion in 2024, a 3.2x increase. Yet according to Harvard Business Review, only **6% of companies fully trust AI agents** to handle core business processes. The gap between adoption and trust is a context engineering problem. ## The Computing Analogy: LLM as Operating System Karpathy provides a powerful mental model: "LLMs are like a new kind of operating system. The LLM is the CPU performing core computation, the context window is the RAM—the working memory." This analogy illuminates why context engineering matters: | Computing | LLM Equivalent | |-----------|----------------| | CPU | The LLM model itself | | RAM | Context window | | Hard Drive | External memory, vector databases | | File System | Context management strategies | | Operating System | The orchestration layer | Just as efficient computing requires thoughtful memory management, effective AI requires thoughtful context engineering. You wouldn't dump your entire hard drive into RAM—and you shouldn't dump everything into your context window. ## The Four Core Strategies Research across Google, Microsoft, Anthropic, and industry practitioners has converged on four core strategies for context engineering. ### 1. Write: Persist Information Outside the Context Window The context window is limited—even with 2 million token windows (McKinsey notes context windows "spiked from 100,000 to two million tokens"). The "write" strategy involves persisting information externally. **Practical implementations:** - **Scratchpads**: Agents maintain working notes during complex tasks - **Session memories**: Short-term interaction history - **Long-term memories**: Persistent facts across sessions - **Three memory types**: Episodic (concrete examples), procedural (rules), semantic (world facts) Google's Agent Development Kit (ADK) separates "Session" (storage) from "Working Context" (view), enabling context caching and efficient reuse. As their engineering blog explains: "To build production-grade agents, the industry is exploring context engineering—treating context as a first-class system with its own architecture, lifecycle, and constraints." ### 2. Select: Surface Only What's Relevant Not everything should go into context. Anthropic's guidance: "One of the most common failure modes is bloated tool sets that cover too much functionality. If a human engineer can't definitively say which tool should be used, an AI agent can't be expected to do better." **Research finding**: When researchers gave a quantized Llama 3.1 8B access to 46 tools, it failed completely—even within its 16k context window. With just 19 tools, it succeeded. The issue wasn't context length; it was context complexity. **Selection techniques:** - Embedding-based retrieval (this connects to our posts on vector databases!) - Knowledge graph traversal - Tool description filtering using RAG principles - Semantic boundaries for code indexing ### 3. Compress: Fit More Meaning in Less Space When you have more relevant information than fits, compression becomes essential. **Anthropic's recommendation**: "Compaction is the practice of taking a conversation nearing the context window limit, summarizing its contents, and reinitiating a new context window with the summary. The art of compaction lies in the selection of what to keep versus what to discard." **Microsoft's research** has produced algorithms like LLMLingua2 and TACO-RL for prompt compression—combining structured and unstructured context pruning to minimize tokens without losing utility. **OpenAI's GPT-5.2** includes a `/responses/compact` endpoint that performs "loss-aware compression" over conversation state, returning encrypted items that preserve task-relevant information while dramatically reducing token footprint. ### 4. Isolate: Control What the Model Sees This is perhaps the most underrated strategy. An agent's runtime state can be designed with multiple fields—some exposed to the model, others hidden for selective use. **Applications:** - Role-based visibility for different tasks - Security isolation for sensitive data - Staged revelation of information - Multi-agent architectures with specialized sub-agents Anthropic's multi-agent researcher uses specialized sub-agents with separate contexts—outperforming single-agent systems (though using 15x more tokens in the process). ## The Context Rot Problem: A Critical Finding Here's research that should concern every AI builder. Analysis of 18 leading LLMs (GPT-4.1, Claude 4, Gemini 2.5, Qwen 3) revealed what researchers call "Context Rot": - **GPT-4's accuracy dropped from 98.1% to 64.1%** just based on how information was presented in context - Performance degrades **non-uniformly and unpredictably**, not linearly - **Position bias**: Tokens at beginning and end receive disproportionate attention ("lost in the middle" problem) - **Stunning finding**: Models with full conversations (113k tokens) performed **worse** than focused 300-token segments The takeaway: "What you remove can matter as much as what you keep." This is why context engineering is a discipline, not a trick. Simply having a large context window doesn't guarantee good results. ## The Order Matters: A Non-Obvious Insight Context order shifts model behavior in non-obvious ways. The recommended structure: 1. **System rules** (role, constraints, personality) 2. **Tool definitions** (what capabilities are available) 3. **User/conversation history** (recent context) 4. **Current task** (specific request) 5. **Examples** (if using few-shot) Anthropic notes that Claude 4.x models "take you literally and do exactly what you ask for, nothing more." Earlier models inferred intent; newer models require explicit instruction. Context architecture must adapt. ## Real-World Applications ### Claude Code & Cursor Intelligent file selection with hierarchical summarization achieves usability across thousands of files without performance degradation. This is context engineering in action. ### Shopify's AI Transformation CEO Tobi Lütke's internal memo made waves: "Learning to prompt and load context is important, and getting peers to provide feedback on how this is going will be valuable." Shopify ordered 3,000 Cursor licenses and now includes AI proficiency in performance reviews. Lütke described AI as a "massive productivity multiplier," citing instances of **100x expected output** for employees who master context loading. ### Enterprise Orchestration According to HBR's survey, more than 80% of enterprises view "providing connectivity to applications and contextual information from data" as very or moderately important outcomes of their AI orchestration efforts. ## Connecting to Our AI Fundamentals Here's how context engineering ties to everything we've covered: | Previous Topic | Connection to Context Engineering | |----------------|-----------------------------------| | **Embeddings** | Enable semantic selection of relevant context | | **Vector Databases** | Store and retrieve context efficiently (the 80ms search across 10 million vectors) | | **Neural Networks** | Understanding how models process and weight context | | **Prompting Techniques** | CoT and Few-Shot ARE context engineering techniques | | **Vibe-Planning** | Context engineering makes planning sessions more effective | This is why I've built this series progressively. Each concept enables the next. ## Why This Matters for You Understanding context engineering is the difference between using AI tools and mastering them. **For developers**: As Ryan Salva, Senior Director of Product Management at Google, told MIT Technology Review: "A lot of work needs to be done to help build up context and get the tribal knowledge out of our heads." Better context means AI understands your codebase, your patterns, your constraints. **For PMs**: Context engineering helps you get better answers when exploring product decisions. When you vibe-plan (as we discussed), the quality of context determines the quality of insights. **For enterprise leaders**: HBR reports that while 78% of organizations use AI in at least one business function, trust remains constrained to "lower-risk work." Context engineering is the path to trusted, high-stakes AI deployment. As Zencoder founder Andrew Filev told MIT Technology Review: "Context is critical. The first generation of tools did a very poor job on context—they would basically just look at your open tabs." ## Getting Started: A Practical Framework And the best part? You can start improving your context engineering today: 1. **Structure your context in layers**: Role → Constraints → Tools → History → Task → Examples 2. **Be explicit about state**: Include dates, user context, session history (Anthropic: "Don't make the model guess") 3. **Curate, don't dump**: Select relevant information; more isn't always better 4. **Specify output format**: Tell the model exactly what structure you want 5. **Measure and iterate**: Context engineering requires experimentation—track what works **Practical prompt template:** ``` ## Context Layer 1: Identity You are [role] with expertise in [domain]. ## Context Layer 2: Constraints - Maximum response length: [X words] - Required format: [structure] - Must include: [elements] ## Context Layer 3: Background Current date: [date] User context: [relevant details] Previous conversation: [summary if relevant] ## Context Layer 4: Task [Specific request] ## Context Layer 5: Examples (if needed) [Quality examples of desired output] ``` ## What's Next Excited to delve deeper? In next week's post, I will explain RAG (Retrieval Augmented Generation)—the architecture that connects context engineering to your actual data. RAG is where embeddings, vector databases, and context engineering converge to give AI access to YOUR knowledge, not just its training data. We'll explore how RAG reduces hallucinations, enables domain-specific AI, and forms the foundation of most production AI systems today—including why LangChain recommends LangGraph for all new agent implementations. Stay tuned!! ## Key Takeaways - **Context engineering** asks "what configuration of context generates the behavior I need?"—beyond word choice - **Karpathy's insight**: "Context engineering is the delicate art and science of filling the context window with just the right information" - **Anthropic's truth**: "Intelligence is not the bottleneck, context is" - **LangChain data**: 57% have agents in production, but 32% cite quality issues—mostly from poor context management - **Context Rot**: GPT-4 accuracy dropped 34 percentage points based on context presentation alone - **Four core strategies**: Write (persist), Select (filter), Compress (summarize), Isolate (control visibility) - **Order matters**: System rules → Tools → History → Task → Examples - **Shopify example**: Tobi Lütke ties AI proficiency (including context loading) to performance reviews ## Additional Documents to Read on This - [Effective Context Engineering for AI Agents - Anthropic](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) - From the Claude team - [State of AI Agents 2025 - LangChain](https://www.langchain.com/stateofaiagents) - Industry survey data - [Context Engineering: The Definitive 2025 Guide - FlowHunt](https://www.flowhunt.io/blog/context-engineering/) - Comprehensive technical guide - [From Vibe Coding to Context Engineering - MIT Technology Review](https://www.technologyreview.com/2025/11/05/1127477/from-vibe-coding-to-context-engineering-2025-in-software-development/) - Industry analysis - [Architecting Context-Aware Multi-Agent Framework - Google Developers](https://developers.googleblog.com/architecting-efficient-context-aware-multi-agent-framework-for-production/) - Google's ADK approach - [Efficient AI Applications: Context Engineering - Microsoft Research](https://www.microsoft.com/en-us/research/project/efficient-ai-applications-context-engineering-and-agents/) - Microsoft's research --- *This is part of my weekly AI series taking you progressively from fundamentals to practical applications. Previous posts covered embeddings, vector databases, neural networks, prompting techniques, vibe coding dangers, vibe-planning, BMAD Method, PRP Framework, and Spec-Driven Development. Next week: RAG (Retrieval Augmented Generation).*