AoE2 · LLM Arena

Chapter 9: Labeling and Active Learning

Real screenshots are labeled in CVAT, exported in COCO format, converted to YOLO labels, and merged with synthetic data. An active learning pipeline prioritizes the most informative images for labeling.

9.1 The Labeling Workflow

Raw Screenshots              Pre-label with YOLO         CVAT
(220 images in               (generate initial           (manual correction
 real_screenshots/raw/)       bounding boxes)             + new annotations)
        ↓                           ↓                          ↓
   prelabel.py              Import to CVAT              Export as COCO 1.0
                            (with classes.txt)                  ↓
                                                     prepare_training.py
                                                        (COCO → YOLO +
                                                         merge with synthetic)

                                                      training_data/

9.2 Pre-Labeling

packages/detection/src/labeling/prelabel.py bootstraps annotation by running the existing model on unlabeled screenshots:

  1. Loads the current YOLO model (v1 or v2)
  2. Runs inference on each image in real_screenshots/raw/
  3. Exports predictions as YOLO .txt label files
  4. Generates classes.txt for CVAT project import
  5. Optionally saves preview images with drawn bounding boxes

Pre-labels are not training-quality — they provide a starting point for human annotators to correct rather than drawing everything from scratch. Confidence thresholds are set low (0.15) to catch more potential objects.

9.3 CVAT Integration

Export Format: COCO 1.0 (Not YOLO)

Key Insight: CVAT’s YOLO 1.1 export format silently drops polygon annotations — only rectangles survive the export. Since some entities are labeled with polygon shapes in CVAT (for precise outlines), the project uses COCO 1.0 export format instead. prepare_training.py handles the COCO-to-YOLO conversion, computing bounding boxes from polygon vertices.

Export Format Detection

prepare_training.py auto-detects the export format at packages/detection/src/labeling/prepare_training.py:43-66:

def detect_export_format(cvat_dir) -> "coco" | "yolo":
    # Check for COCO JSON
    if (cvat_dir / "annotations" / "instances_default.json").exists():
        return "coco"
    # Fall back to YOLO text files
    return "yolo"

COCO to YOLO Conversion

For COCO exports (prepare_training.py:69-200):

  1. Reads annotations/instances_default.json
  2. Maps COCO category IDs to v2 class IDs by name (not by numeric ID)
  3. For each annotation:
    • If it has a direct bbox: uses [x, y, width, height] directly
    • If it has segmentation (polygon): computes bounding box from min/max of polygon vertices
  4. Converts to YOLO normalized format: class_id x_center y_center w_norm h_norm

COCO categories are 1-indexed while YOLO classes are 0-indexed. The name-matching approach avoids this pitfall entirely — both sides are looked up by name, not ID.

CVAT Directory Structure

The code handles multiple CVAT export directory layouts:

  • obj_train_data/ — standard YOLO export
  • obj_Train_data/ — case variant (observed in some CVAT versions)
  • labels/ — alternative layout
  • .txt files at root level

9.4 Class Schema

All data sources now use classes.yaml IDs directly (60 classes). See Chapter 13 for schema history.

packages/detection/src/labeling/class_mapping.py provides the loader (load_classes_yaml()) and the CVAT helpers (get_classes_for_cvat(), write_classes_txt()). The model (YOLO26/v6) emits classes.yaml IDs natively, so there is no class remapping — prelabel.py and prepare_training.py write classes.yaml IDs directly. (The legacy v1→v2 mapping helpers were removed.)

9.5 Hybrid Dataset Merge

prepare_training() at packages/detection/src/labeling/prepare_training.py:248-445 orchestrates the full merge:

  1. Scan local images — builds an index of all raw screenshots by filename
  2. Detect export format — auto-detects COCO or YOLO from the CVAT export directory
  3. Convert COCO to YOLO — if COCO format, generates temp YOLO label files
  4. Match labels to images — pairs each label file with its corresponding image
  5. Split real data — 85/15 train/val split with seed=42 for reproducibility
  6. Copy synthetic data — copies synthetic images and labels directly (both use classes.yaml IDs)
  7. Copy real data — copies real images with real_ prefix to avoid filename collisions
  8. Generate dataset.yaml — writes the YOLO training config with all 60 classes

Output structure:

training_data/
├── train/
│   ├── images/
│   │   ├── img_00000.jpg          # synthetic
│   │   ├── real_screenshot_001.jpg # real
│   └── labels/
│       ├── img_00000.txt          # classes.yaml IDs
│       ├── real_screenshot_001.txt # classes.yaml IDs
├── val/
│   └── ...
├── dataset.yaml
└── merge_summary.json             # statistics

9.6 Active Learning Pipeline

packages/detection/src/labeling/active_learning.py optimizes which images to label next.

Triage: Scoring Images by Informativeness

Runs the current model on all unlabeled images and scores each by how “interesting” it is to the model:

ConditionScoreRationale
Detection with confidence < 0.15+3Model is confused
Detection with 0.15 <= confidence < 0.7+2Model is uncertain
No detections at all+15Completely novel content
Fewer than expected detections+5Missing entities

High-scoring images are the most informative for training — they represent cases where the model struggles.

Prepare Batch

Selects the top-N highest-scoring images and creates a CVAT-ready batch:

  • Copies images to an output directory
  • Generates pre-labels for CVAT import
  • Writes classes.txt with v2 class names

Integrate

After manual correction in CVAT, integrate() copies the corrected labels into the training dataset.

9.7 Current Dataset Scale

SourceTrainValTotal
Synthetic2,4006003,000
Real (labeled)50858
Total2,4506083,058

220 raw screenshots exist in packages/detection/src/real_screenshots/raw/. 58 have been labeled in CVAT so far.


Summary

  • CVAT labels exported as COCO 1.0 (not YOLO, which drops polygons)
  • Automatic COCO-to-YOLO conversion with name-based class mapping
  • All data sources use classes.yaml IDs directly (no remapping needed for v5+ synthetic data)
  • Active learning scores unlabeled images by model uncertainty