2022-23 Era 1

Prompt Engineering

"Craft the right words"

Token windows were tight - GPT-3.5 at 4K, GPT-4 launched at 8K. Every word in your prompt competed with every word in the response. The skill was compression: pack maximum intent into minimum space, shape output through precision instructions, and work around an RLHF-tuned model that defaulted to verbose disclaimers.

The query - land research across NY, CA and TX (2023 style)

You are a real estate investment advisor. Answer concisely. Compare buying RURAL LAND (not homes) across 3 states: NY | CA | TX For each state cover ONLY: 1. Annual property tax rate (%) 2. State capital gains tax on sale 3. Top 1 gain 4. Top 1 pitfall Format: markdown table. No preamble. No caveats. Current year: 2023.

Every element was deliberate. "You are a real estate advisor" primed the persona. "Not homes" prevented drift. "No preamble. No caveats" fought the model's verbose defaults. "Current year: 2023" compensated for a training cutoff two years prior.

Strengths

  • +Zero infrastructure - just text
  • +Fast iteration - rewrite and retry instantly
  • +Portable across any model or interface
  • +Role and format control shaped output reliably
  • +No tooling cost or setup overhead

Limitations

  • -Tax rates from stale training data
  • -No memory - every turn started fresh
  • -County-level figures often hallucinated
  • -One shot - no feedback loop or tools
  • -Skill lived entirely in the human's head

2024-25 Era 2

Context Engineering

"Curate the right information"

Token windows exploded - GPT-4 Turbo at 128K, Claude at 200K, Gemini at 1M. The bottleneck shifted from what you could ask to what you could inject. RAG and vector-based context engineering peaked in late 2024 and continues to evolve. The craft moved from writing tight prompts to architecting information pipelines that assembled the right knowledge before the model ever reasoned.

The assembled context window - land research (2025 style)

SystemYou are a real estate research assistant. Always cite sources. Flag data older than 6 months. Never hallucinate tax rates.
RAGUSDA 2025: NY cropland $4,010/acre (+4.2% YoY) · TX cropland +5.4% · CA Prop 13 reassessment at purchase · NAR post-settlement avg commission 5.70%
RAGIRS 2025: CA state cap gains 13.3% (highest in US) · NY 10.9% · TX $0 state capital gains
Toolscounty_assessor_api · irs_capital_gains_calculator · nar_commission_lookup
HistoryUser budget: $500K · Federal bracket: 37% · Preferred holding period: 5-7 years

The user's question stayed simple. The heavy lifting moved upstream - into vector databases, chunking strategies, re-ranking algorithms, and context assembly pipelines. The prompt engineer became an information architect.

Strengths

  • +Live USDA and IRS data - no stale figures
  • +User profile persisted across session turns
  • +Tool definitions enabled structured lookups
  • +Grounded answers with source attribution
  • +Large windows allowed rich multi-doc reasoning

Limitations

  • -Poor chunking could break tax tables mid-document
  • -Model answered and stopped - no autonomous action
  • -Lost-in-the-middle problem with long contexts
  • -No cross-session memory or persistent state
  • -RAG pipeline quality determined everything

2026 Era 3

Harness Engineering

"Architect the right environment"

The model is no longer the bottleneck. The harness is: the runtime infrastructure surrounding the model that coordinates tool dispatch, verification loops, memory, guardrails, and audit logging. The user no longer writes a prompt. They define a mission. The agent executes. Agent = Model + Harness - where the model provides reasoning, and the harness acts as the state machine, memory boundary, and execution gateway.

The execution flow - land research mission (2026 style)

1

Mission definition

User sets goal, constraints, budget ($500K), tax bracket (37%), holding period (5-7 yrs), and approval gates. No prompt written.

2

Autonomous planning

Agent decomposes into subtasks: land values, property taxes, capital gains, commission, ROI model, sensitivity analysis, draft report.

3

Execute and verify loop

Calls usda_land_values_api for NY ($4,010/acre), cross-verifies via web search, flags CA combined cap gains of 37.1% (federal 23.8% + CA 13.3%) as exceeding federal income tax rate.

4

Conflict detection

TX article cites 5% price decline; USDA shows +5.4%. Harness triggers conflict resolution and does not proceed until resolved.

5

Independent evaluation

Separate evaluator agent audits the ROI model. Generator and evaluator are deliberately isolated - models cannot reliably assess their own work.

6

Human handoff

Delivers ROI spreadsheet (3 states, 3 scenarios), draft recommendation, 1 flagged TX conflict, and full audit log of every tool call and source.

Strengths

  • +Multi-step autonomy - plan, execute, verify
  • +Conflict detection before recommendations
  • +Guardrails enforced at infrastructure level
  • +Full audit log - every source, every call
  • +Persistent state across sessions and agents

Limitations

  • -Multi-agent coordination at scale still hard
  • -Cost governance - 40+ tool calls gets expensive
  • -Real-world execution (deeds, filings) still human
  • -Harness design requires distributed systems skill

Side-by-side comparison

DimensionPrompt (2022-23)Context (2024-25)Harness (2026)
Core questionWhat do I say?What does it know?What can it reliably do?
Land researchOne compressed promptRAG-injected USDA and tax docsAutonomous multi-step mission
Data freshnessTraining cutoff (stale)RAG pull at query timeLive tool calls, dual-verified
MemoryNoneSession-level re-injectionPersistent cross-session state
Error handlingRewrite the promptRe-chunk, re-embedRetry, fallback, escalate, log
Human roleWrites every promptDesigns retrieval pipelinesDefines mission, reviews output
Failure modeBad wordingBad retrievalBad harness design
Skill metaphorAssembly languageSQL / data architectureDistributed systems / OS design

Explore further

Buying Land in NY, CA and Texas

Prices, taxes, capital gains, commissions, and the full 2019-2026 arc. The real-world data behind the scenario.

Top 5 Agent Harness Frameworks

LangGraph, CrewAI, AutoGen, Google ADK, and the OpenAI Agents SDK compared on architecture and production readiness.