SFT Data

A finished job leaves you with raw trajectories. tracer turns those into supervised fine-tuning data that the downstream trainer block (LLaMA-Factory) can train on directly — with quality scoring baked in so you can filter for the best examples.

Conversion is powered by the swe_data_process package and driven by scripts/convert_trajectories.sh.

The conversion pipeline

Raw trajectories (per-scaffold format)
   └─ converter ─▶ IM format (OpenAI messages + tool_calls, JSONL)
                     └─ rule / llm scoring ─▶ scored IM
                                                └─ to LF ─▶ LF format (ShareGPT array, JSON)
                                                              └─▶ trainer block (LLaMA-Factory)

Raw → IM — a scaffold-specific converter reshapes the raw trajectory into the intermediate "IM" format: OpenAI-style messages carrying tool_calls, one JSONL row per trajectory.
Scoring — rule_score.py runs automatically to attach a composite_score; llm_score.py (LLM-as-judge) can run optionally. See Scoring.
IM → LF — the scored IM is reshaped into the LLaMA-Factory "LF" format: a ShareGPT-style JSON array ready for SFT.

Running conversion

Conversion can run automatically at the end of a job, or on demand:

# Convert one job's trajectories (latest, or a named job)
bash scripts/convert_trajectories.sh --job latest

When using the tracer plugin instead of shell commands, this on-demand conversion/stat refresh lives under /tracer:dashboard.

Useful flags:

Flag	Purpose
`--job <name\|latest>`	Which Harbor job to convert
`--scaffold <auto\|claude_code\|open_code\|openhands_sdk\|terminus2>`	Override scaffold detection (`auto` derives from the agent/job name)
`--out-dir <dir>`	Output root (default `artifacts/sft_data`)
`--max-instances <n>`	Cap the number of converted instances
`--exclude-repos-file <path>`	Exclude trajectories from listed repos (default `artifacts/excluded_repos.txt`)
`--reasoning-check-mode <strict\|adaptive>`	Reasoning-content filter mode; default is `adaptive`
`--reasoning-content-ratio-threshold <0..1>`	Adaptive reasoning-content ratio threshold; default is `0.5`

To run conversion automatically after every job, set sft_conversion.enabled: true in config.yaml; start.sh then runs the convert step once Harbor exits.

Outputs

artifacts/sft_data/<job>/
├── im.jsonl        # intermediate, scored
├── lf.json         # LLaMA-Factory ShareGPT array (trainer block input)
└── lf.stats.json   # token / turn / score statistics

lf.json is the block's sft_data_dir output, consumed by the trainer block. lf.stats.json powers the dashboard.

Learn more

Scaffolds — supported agent formats and their converters
Scoring — rule-based and LLM-based trajectory quality scoring

SFT Data

The conversion pipeline

Running conversion

Outputs

Learn more

On this page