Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
Abstract
Adversarial scenario generation is a cost-effective approach for the safety assessment of autonomous driving systems. However, existing methods are either overly aggressive, resulting in low realism, or constrained to a fixed trade-off between multiple objectives, lacking the flexibility to adapt to diverse training and testing needs without costly retraining. Here, we reframe the task as a multi-objective preference alignment problem and introduce the first framework for Steerable Adversarial scenario GEneration (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this scheme through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies.
Method Overview

Left: Existing methods suffer from a fixed trade-off between adversariality and realism. Center: Our method, SAGE, reframes the problem as preference alignment. We train two expert models (adversarial and realistic) and interpolate their weights at test time to steer generation across a continuous spectrum of behaviors. Right: This steerability is highly effective for dual curriculum learning in closed-loop training.
Results Gallery

Test-Time Steerability: Generated trajectories smoothly transition from compliant to aggressive as the adversarial weight increases from 0.0 to 1.0.

High-Quality Scenarios: SAGE generates challenging yet physically plausible maneuvers, crucial for meaningful safety validation, while baselines often produce awkward or rule-violating trajectories.

Superior Trade-off: Our weight-mixing strategy (blue line) traces a superior Pareto front, achieving better realism for any given level of adversariality compared to other mixing strategies.

Enhanced Driving Policy: The agent trained with our SAGE demonstrates a clear advantage across key metrics like reward and completion rate compared to baseline training methods.
Testing Results: Replay Policy
Evaluation of adversarial generation methods against the Replay policy. Higher is better for Attack Success Rate and Adversarial Reward (↑), while lower is better for penalty metrics (↓).
Methods | Attack Succ. Rate ↑ | Adv. Reward ↑ | Real. Pen. ↓ | Map Comp. ↓ | Dist. Diff. (WD) | ||||
---|---|---|---|---|---|---|---|---|---|
Behav. | Kine. | Crash Obj. | Cross Line | Accel. | Vel. | Yaw | |||
Rule | 100.00% | 5.048 | 2.798 | 5.614 | 1.734 | 7.724 | 2.080 | 8.546 | 0.204 |
CAT | 94.85% | 3.961 | 8.941 | 3.143 | 2.466 | 9.078 | 1.556 | 7.233 | 0.225 |
KING | 40.85% | 2.243 | 5.883 | 3.434 | 3.126 | 6.056 | 0.972 | 255.5 | 0.098 |
AdvTrajOpt | 70.46% | 2.652 | 4.500 | 2.775 | 2.547 | 10.16 | 1.754 | 6.177 | 0.268 |
SEAL | 63.93% | 1.269 | 3.017 | 2.423 | 2.732 | 11.612 | 1.544 | 6.959 | 0.202 |
SAGE (wadv=0.0) | 16.26% | 1.065 | 0.332 | 1.998 | 0.677 | 0.948 | 1.459 | 9.313 | 0.054 |
SAGE (wadv=0.5) | 50.41% | 2.523 | 0.483 | 2.064 | 0.755 | 0.949 | 1.521 | 8.471 | 0.079 |
SAGE (wadv=1.0) | 76.15% | 4.121 | 1.429 | 2.479 | 0.731 | 1.084 | 2.098 | 8.088 | 0.184 |
Testing Results: RL Policy
Evaluation of adversarial generation methods against the RL policy.
Methods | Attack Succ. Rate ↑ | Adv. Reward ↑ | Real. Pen. ↓ | Map Comp. ↓ | Dist. Diff. (WD) | ||||
---|---|---|---|---|---|---|---|---|---|
Behav. | Kine. | Crash Obj. | Cross Line | Accel. | Vel. | Yaw | |||
Rule | 65.57% | 2.761 | 2.180 | 113.7 | 1.803 | 6.148 | 10.85 | 13.47 | 0.336 |
CAT | 30.33% | 1.319 | 8.191 | 3.039 | 2.623 | 6.967 | 1.539 | 8.877 | 0.187 |
KING | 19.14% | 1.148 | 2.041 | 2.596 | 3.114 | 5.857 | 0.983 | 259.1 | 0.097 |
AdvTrajOpt | 19.40% | 0.992 | 4.542 | 2.779 | 2.459 | 9.973 | 1.749 | 6.187 | 0.269 |
SEAL | 31.40% | 0.752 | 5.871 | 2.684 | 3.030 | 11.98 | 1.563 | 8.267 | 0.267 |
SAGE (wadv=0.0) | 11.20% | 0.722 | 0.332 | 2.000 | 0.738 | 0.956 | 1.456 | 9.344 | 0.055 |
SAGE (wadv=0.5) | 13.66% | 0.819 | 0.496 | 2.066 | 0.820 | 0.820 | 1.515 | 8.475 | 0.080 |
SAGE (wadv=1.0) | 28.42% | 1.400 | 1.468 | 2.496 | 0.792 | 1.366 | 2.098 | 8.114 | 0.188 |
Closed-loop RL Training
Evaluation of Trained RL Policies in the Log-replay (Normal, WOMD) Environments.
Methods | Reward ↑ | Cost ↓ | Compl. ↑ | Coll. ↓ | Ave. Speed ↑ | Ave. Jerk ↓ |
---|---|---|---|---|---|---|
SAGE | 51.99 ± 1.22 | 0.48 ± 0.05 | 0.77 ± 0.02 | 0.16 ± 0.05 | 9.27 ± 0.03 | 24.97 ± 0.53 |
CAT | 46.81 ± 4.33 | 0.50 ± 0.05 | 0.67 ± 0.02 | 0.18 ± 0.05 | 7.21 ± 0.05 | 28.15 ± 1.06 |
Replay (No Adv) | 50.16 ± 5.32 | 0.50 ± 0.07 | 0.72 ± 0.04 | 0.23 ± 0.02 | 9.03 ± 0.03 | 27.53 ± 0.98 |
Rule-based Adv | 44.61 ± 3.88 | 0.52 ± 0.05 | 0.63 ± 0.04 | 0.13 ± 0.00 | 6.00 ± 0.10 | 28.22 ± 1.44 |
Qualitative Comparisons
RAW
CAT
Rule
SAGE
RAW
CAT
Rule
SAGE
BibTeX
@article{nie2025steerable,
title={Steerable Adversarial Scenario Generation through Test-Time Preference Alignment},
author={Nie, Tong and Mei, Yuewen and Tang, Yihong and He, Junlin and Sun, Jie and Shi, Haotian and Ma, Wei and Sun, Jian},
journal={arXiv preprint arXiv:2509.20102},
year={2025}
}