Chapter 1: System Overview
The AoE2 LLM Arena agent plays Age of Empires II autonomously using a two-tier LLM architecture. A Sonnet strategist reads the resource bar via local OCR and sets goals; a Sonnet executor reads YOLO-detected entities as text and executes mouse/keyboard actions. No game API, no memory-mapped data — the agent perceives only pixels (YOLO entity detection + local OCR of the HUD), and both LLM tiers are text-only.
1.1 Two-Tier Architecture
The agent splits decision-making into two models:
Strategist (Sonnet) — Runs every 10 turns (or on alarm). Reads resource values, population, and age from the resource bar locally via OCR (resource_ocr.py, RapidOCR) — no screenshot is sent to the model; its prompt is text-only. Creates 3-5 prioritized goals and caches resource readings for the executor.
Executor (Sonnet) — Runs every turn. Receives only text: YOLO entity list, cached resource readings, active goals, memory context, and game knowledge. Returns structured actions (clicks, key presses) validated as Pydantic models. Routine turns take a fast single-shot call; combat/housing turns take an agentic tool loop (see Chapter 4 §4.3).
The split separates concerns: the strategist owns slow, periodic goal-setting; the executor owns rapid, per-turn tactics. Both tiers are text-only — the strategist reads the HUD via local OCR, the executor reads the YOLO entity list. Both run claude-sonnet-4-6 — the executor was moved from Haiku to Sonnet for more reliable instruction-following — and a per-call effort knob (default low) keeps the executor fast.
1.2 Component Map
agent/
├── gameplay_agent/ # Core agent runtime
│ ├── main.py # CLI entry point, provider creation
│ ├── config.py # Pydantic configuration with env var overrides
│ ├── game_loop.py # Main capture→detect→alarm→strategist→execute cycle
│ ├── memory.py # Working memory and game state tracking
│ ├── goals.py # Goal management, alarm system, reward computation
│ ├── goal_logger.py # Goal progress and completion logging
│ ├── executor.py # Action execution via pyautogui (dispatch pattern)
│ ├── models.py # Pydantic action/response validation (7 action types)
│ ├── entity_utils.py # Entity attribute extraction and summary formatting
│ ├── screen.py # Screenshot capture via mss
│ ├── window.py # Game window detection and focus management
│ └── providers/
│ ├── base.py # Abstract LLM provider interface
│ ├── claude.py # Sonnet executor (text-only, no images)
│ └── strategist.py # Sonnet strategist (local OCR + goal generation)
├── detection/ # YOLO entity detection (optional)
│ ├── inference/
│ │ ├── detector.py # EntityDetector, 60 classes, IoU tracking
│ │ ├── remote_detector.py # HTTP client for detection server
│ │ ├── ownership.py # Blue-dominance ownership classifier
│ │ ├── thresholds.py # Per-class confidence thresholds
│ │ ├── frame_diff.py # Frame differencing for rescan optimization
│ │ └── models/ # YOLO26 (v6) model weights (.pt/.onnx)
│ ├── training/ # Synthetic data gen + YOLO training
│ ├── labeling/ # CVAT integration + class definitions
│ └── extraction/ # SLD sprite extraction from game files
├── data/ # Game knowledge (optional)
│ ├── game_knowledge.py # SQLite database wrapper
│ └── knowledge_base/ # Static game data files
├── prompts/
│ ├── system.md # Executor system prompt
│ └── strategist.md # Strategist system prompt
├── autoresearch/ # Automated experiment framework
│ ├── game_runner.py # Timed experiments with metrics
│ ├── orchestrator.py # Prompt mutation loop
│ ├── metrics.py # Scoring and analysis
│ └── json_utils.py # Robust JSON extraction from LLM output
└── logs/ # Screenshots, goal logs
1.3 Graceful Degradation
The agent won’t crash without optional subsystems, but YOLO detection is practically required for meaningful gameplay.
Detection — imported inside a try/except at module level in apps/agent/src/game_loop.py:
try:
from detection.inference.detector import EntityDetector, get_detector
DETECTION_AVAILABLE = True
except ImportError:
DETECTION_AVAILABLE = False
Without detection, the executor has no entity list — it cannot target units, buildings, or resources by class or ID. The strategist can still read the resource bar (local OCR) and set goals, but the executor is limited to hotkeys and hardcoded coordinates. In practice, this makes the agent nearly non-functional: it can’t gather resources, train units, or build at specific locations. Detection is technically optional (the agent starts and runs) but practically required for any useful gameplay.
Game Knowledge — imported inside a try/except in apps/agent/src/providers/claude.py:
try:
from data.game_knowledge import GameKnowledge, get_db
GAME_KNOWLEDGE_AVAILABLE = True
except ImportError:
GAME_KNOWLEDGE_AVAILABLE = False
Without the knowledge database, no dynamic context injection occurs. The executor still receives the system prompt and memory context. This is a minor degradation — the agent plays reasonably without it.
Window Management — pygetwindow is optional at apps/agent/src/window.py. When unavailable, functions return True by default — the agent assumes the game is running and focused. Screenshot capture falls back to the full primary monitor.
Key Insight: Detection is the critical optional dependency. Without YOLO, the executor is essentially blind — the experience is very poor. Game knowledge and window management are truly additive enhancements that degrade gracefully.
1.4 Configuration
Configuration uses a Pydantic BaseModel with environment variable overrides (apps/agent/src/config.py):
| Setting | Env Var | Default | Purpose |
|---|---|---|---|
anthropic_api_key | ANTHROPIC_API_KEY | "" | Claude API authentication |
model | AOE2_MODEL | claude-sonnet-4-6 | Executor model (instruction-following) |
executor_effort | AOE2_EXECUTOR_EFFORT | low | Executor output_config effort (low/medium/high) |
strategist_model | AOE2_STRATEGIST_MODEL | claude-sonnet-4-6 | Strategist model (deeper reasoning) |
strategist_interval | AOE2_STRATEGIST_INTERVAL | 10 | Run strategist every N turns |
max_tokens | — | 1536 | Max response tokens per executor call |
max_tool_iterations | — | 7 | Max tool roundtrips per turn (tool-loop path) |
detection_imgsz | — | 640 | YOLO inference resolution (matches the v6 training resolution) |
adaptive_sahi | — | False | SAHI tiling lowers real F1 at retina resolution; agent runs single-pass @640 |
screenshot_quality | — | 85 | JPEG quality (1-100) |
ocr_backend | AOE2_OCR_BACKEND | rapidocr | Resource-bar OCR backend (rapidocr/template/tesseract) |
loop_delay | AOE2_LOOP_DELAY | 0.3 | Seconds between iterations |
action_delay | — | 0.05 | Seconds between individual actions |
pipeline_commit_max | AOE2_PIPELINE_COMMIT_MAX | 2 | Actions committed per pipelined (routine) turn; the tail is discarded |
save_screenshots | AOE2_SAVE_SCREENSHOTS | true | Log screenshots to disk |
log_dir | — | logs | Screenshot and log output directory |
A global singleton config = Config.from_env() is created at module load time and imported throughout the codebase.
1.5 Async-First Architecture
The entire agent runs on asyncio:
- Entry point:
asyncio.run(main_async(args))inapps/agent/src/main.py - API clients:
anthropic.AsyncAnthropicfor both executor and strategist - Game loop:
game_loop()inapps/agent/src/game_loop.py - Action execution:
execute_actions()inapps/agent/src/executor.py - Delays:
asyncio.sleep()for non-blocking waits
pyautogui calls are synchronous but fast (sub-millisecond per click), so they don’t block meaningfully.
1.6 Logging
Structured logging via structlog with colored console output, configured in apps/agent/src/main.py.
Key log events: iteration_start, screenshot_captured, detection_complete, strategist_response, strategist_goals_updated, llm_response, actions_executed, routine_executed, pipeline_head_committed, action_verification, alarm_triggered, turn_reward.
Summary
- Two-tier architecture: Sonnet strategist (local OCR, goals) + Sonnet executor (text-only, actions; single-shot routine turns + tool loop for combat)
- A deterministic reactive tier handles routine villager upkeep every turn with no LLM call; routine turns pipeline (RTC) and entity-affecting actions are verified by re-detection
- Detection is practically required for useful gameplay; game knowledge and window management are truly optional
- Pydantic for config and validation, structlog for observability, asyncio for concurrency
- Goal-driven gameplay with alarm system for emergency defense
Related Topics
- Chapter 2: Game Loop Pipeline — the iteration cycle in detail
- Chapter 4: Provider Pattern — how LLM providers are abstracted
- Chapter 7: Detector Architecture — the optional YOLO system