AoE2 · LLM Arena

Chapter 5: Prompt Engineering

The system prompt (prompts/system.md, ~320 lines + prompts/hotkeys.md appended) is the agent’s rulebook. It teaches Claude the game mechanics, available composite tools, multi-step action patterns, hotkey reference, and strategic priorities.

5.1 Prompt Structure

The prompt is organized into major sections:

SectionPurpose
Your CapabilitiesWhat the agent can do (detect, click, remember, target, rescan)
Active GoalsHow to follow strategist-provided goals by priority
EVERY TURN Checklist11-point priority checklist (idle villagers, housing, food, etc.)
Multi-Task ActionsRecipes using composite tools (build, send_villager, queue_villager)
Smart Targetingrescan, target_class, fallback patterns, modifiers
Handling Failed ActionsHow to react to action failures
Action Types7 base + 3 composite tool types
HotkeysFull AoE2 hotkey reference (appended from hotkeys.md)
Building PlacementPlacement rules and constraints
Action Limits3-7 actions per turn

5.2 Rescan and Coordinate Freshness

Camera-moving hotkeys (H, .) invalidate all screen coordinates. The prompt teaches the LLM to use rescan: true on press actions for these keys, which triggers a fresh screenshot + detection cycle. After rescan, entity coordinates are updated in the detection cache, and the LLM receives fresh entity positions in the tool result.

Composite tools handle this automatically — for example, build executes the full press-click sequence without intermediate rescans since the placement coordinates are pre-determined.

5.3 Composite Tool Patterns

The prompt defines recipes using composite tools for common operations:

Build a house (1 tool call):

{"type": "build", "building_key": "q", "x": 1500, "y": 800, "intent": "Build house"}

Queue a villager (1 tool call):

{"type": "queue_villager", "intent": "Queue villager"}

Send idle villager to resource (1 tool call):

{"type": "send_villager", "target_class": "sheep", "intent": "Send villager to gather sheep"}

These composite tools replaced the old multi-turn patterns where operations had to be split across turns due to camera movement. The composites execute the full sequence internally without intermediate API roundtrips.

5.4 Entity Targeting

Lines 31-41 teach the LLM to prefer target_id when detection is available:

{"type": "right_click", "target_id": "sheep_0", "intent": "Gather from sheep"}

And fall back to coordinates when it isn’t:

{"type": "right_click", "x": 920, "y": 460, "intent": "Gather from sheep at coordinates"}

The LLM sees detected entities in the context as a list with IDs and coordinates, so it can reference them by name.

5.5 Output Format Specification

Lines 43-61 define the JSON contract:

{
  "reasoning": "What you see and strategic thinking",
  "observations": {
    "resources": {"food": 0, "wood": 0, "gold": 0, "stone": 0},
    "population": "5/10",
    "age": "Dark Age",
    "idle_tc": true,
    "housed": false,
    "under_attack": false,
    "events": []
  },
  "actions": [
    {"type": "press", "key": ".", "intent": "Select idle villager"}
  ]
}

reasoning — free-form text explaining what the LLM sees in the screenshot and its strategic thinking. This is logged and stored in memory for context in future turns.

observations — structured game state extracted from the screenshot. These feed back into the memory system (see Chapter 6) to track resources, population, and alerts across turns.

actions — ordered list of actions to execute sequentially. Each has a type, parameters, and an intent string for logging.

5.6 Hotkey Reference

A comprehensive hotkey reference is appended from prompts/hotkeys.md (~113 lines). Key hotkeys for Dark Age:

KeyEffect
HSelect Town Center, center camera
QQueue villager (at TC) / Economic build menu (with villager selected)
.Select idle villager, center camera
,Select idle military unit, center camera
WMilitary build menu (with villager, Feudal Age+)
GAuto Scout (when military unit selected)

The hotkey file covers navigation, TC commands, villager build menus (economic, military, more buildings), and unit commands.

5.7 Action Limits

  • 3-7 actions per turn — with composite tools, each tool call does more work so fewer calls are needed
  • Multi-task turns encouraged — queue villagers + build houses + sweep idle villagers in ONE turn using composite tools
  • Rescan after camera-moving keys — ensures fresh coordinates for subsequent clicks

5.8 Prompt Loading Mechanism

The prompt is loaded from disk in ClaudeProvider.get_system_prompt(age), which returns a list of cacheable content blocks (not a single string):

def get_system_prompt(self, age: str = "Dark Age") -> list[dict]:
    self._load_prompts()
    age_content = self._age_prompts.get(age.split()[0].lower(), ...)
    blocks = [
        {"type": "text", "text": self._core_prompt,          # core + hotkeys + memories
         "cache_control": {"type": "ephemeral"}},
    ]
    if age_content:
        blocks.append({"type": "text", "text": age_content,  # age-specific guidance
                       "cache_control": {"type": "ephemeral"}})
    return blocks

Block 1 (core rules + hotkey reference + cross-game memories) is stable for the whole game and is cached on every call. Block 2 (age-specific guidance) changes only on age-ups (≤3 times per game) and is also cached, so every turn within an age reads it from cache instead of re-prefilling. Prompts are lazily loaded once and cached for the session; editing prompt files requires restarting the agent.

A fallback inline prompt provides minimal JSON format and action types — enough to run but without strategic depth.

The cache_control markers are set in providers/claude.py: two on the system blocks (above), plus a moving breakpoint on the most recent message in the executor’s tool loop (_apply_moving_cache_breakpoint) so iterations 2–7 read the growing conversation from cache rather than re-prefilling it. See Chapter 4: Provider Pattern for the exact API call shape.


Summary

  • ~320-line system prompt + ~113-line hotkey reference teaching game mechanics, composite tools, and strategic priorities
  • Composite tools (build, send_villager, queue_villager) collapse multi-step sequences into single tool calls
  • 11-point EVERY TURN checklist drives prioritized decision-making
  • Rescan mechanism handles coordinate freshness after camera-moving keys
  • 3-7 actions per turn with multi-task turns encouraged
  • Loaded from disk with prompt caching and inline fallback