"Craft the right words"
Token windows were tight - GPT-3.5 at 4K, GPT-4 launched at 8K. Every word in your prompt competed with every word in the response. The skill was compression: pack maximum intent into minimum space, shape output through precision instructions, and work around an RLHF-tuned model that defaulted to verbose disclaimers.
The query - land research across NY, CA and TX (2023 style)
Every element was deliberate. "You are a real estate advisor" primed the persona. "Not homes" prevented drift. "No preamble. No caveats" fought the model's verbose defaults. "Current year: 2023" compensated for a training cutoff two years prior.
Strengths
Limitations
"Curate the right information"
Token windows exploded - GPT-4 Turbo at 128K, Claude at 200K, Gemini at 1M. The bottleneck shifted from what you could ask to what you could inject. RAG and vector-based context engineering peaked in late 2024 and continues to evolve. The craft moved from writing tight prompts to architecting information pipelines that assembled the right knowledge before the model ever reasoned.
The assembled context window - land research (2025 style)
The user's question stayed simple. The heavy lifting moved upstream - into vector databases, chunking strategies, re-ranking algorithms, and context assembly pipelines. The prompt engineer became an information architect.
Strengths
Limitations
"Architect the right environment"
The model is no longer the bottleneck. The harness is: the runtime infrastructure surrounding the model that coordinates tool dispatch, verification loops, memory, guardrails, and audit logging. The user no longer writes a prompt. They define a mission. The agent executes. Agent = Model + Harness - where the model provides reasoning, and the harness acts as the state machine, memory boundary, and execution gateway.
The execution flow - land research mission (2026 style)
Mission definition
User sets goal, constraints, budget ($500K), tax bracket (37%), holding period (5-7 yrs), and approval gates. No prompt written.
Autonomous planning
Agent decomposes into subtasks: land values, property taxes, capital gains, commission, ROI model, sensitivity analysis, draft report.
Execute and verify loop
Calls usda_land_values_api for NY ($4,010/acre), cross-verifies via web search, flags CA combined cap gains of 37.1% (federal 23.8% + CA 13.3%) as exceeding federal income tax rate.
Conflict detection
TX article cites 5% price decline; USDA shows +5.4%. Harness triggers conflict resolution and does not proceed until resolved.
Independent evaluation
Separate evaluator agent audits the ROI model. Generator and evaluator are deliberately isolated - models cannot reliably assess their own work.
Human handoff
Delivers ROI spreadsheet (3 states, 3 scenarios), draft recommendation, 1 flagged TX conflict, and full audit log of every tool call and source.
Strengths
Limitations
Side-by-side comparison
| Dimension | Prompt (2022-23) | Context (2024-25) | Harness (2026) |
|---|---|---|---|
| Core question | What do I say? | What does it know? | What can it reliably do? |
| Land research | One compressed prompt | RAG-injected USDA and tax docs | Autonomous multi-step mission |
| Data freshness | Training cutoff (stale) | RAG pull at query time | Live tool calls, dual-verified |
| Memory | None | Session-level re-injection | Persistent cross-session state |
| Error handling | Rewrite the prompt | Re-chunk, re-embed | Retry, fallback, escalate, log |
| Human role | Writes every prompt | Designs retrieval pipelines | Defines mission, reviews output |
| Failure mode | Bad wording | Bad retrieval | Bad harness design |
| Skill metaphor | Assembly language | SQL / data architecture | Distributed systems / OS design |
Explore further