--> SAGE: Steerable Adversarial Scenario Generation

Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

1The Hong Kong Polytechnic University, 2Tongji University, 3McGill University
ArXiv Preprint
*Equal Contribution, Corresponding Authors

SAGE enables fine-grained control over adversarial scenario generation at test time. By simply adjusting a preference weight without retraining, we can smoothly steer the generated behavior from naturalistic and compliant to challenging and highly adversarial.

Abstract

Adversarial scenario generation is a cost-effective approach for the safety assessment of autonomous driving systems. However, existing methods are either overly aggressive, resulting in low realism, or constrained to a fixed trade-off between multiple objectives, lacking the flexibility to adapt to diverse training and testing needs without costly retraining. Here, we reframe the task as a multi-objective preference alignment problem and introduce the first framework for Steerable Adversarial scenario GEneration (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this scheme through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies.

Method Overview

Overview of the SAGE framework, showing limitations of existing methods, our solution via test-time alignment, and its application in closed-loop training.

Left: Existing methods suffer from a fixed trade-off between adversariality and realism. Center: Our method, SAGE, reframes the problem as preference alignment. We train two expert models (adversarial and realistic) and interpolate their weights at test time to steer generation across a continuous spectrum of behaviors. Right: This steerability is highly effective for dual curriculum learning in closed-loop training.

Results Gallery

Testing Results: Replay Policy

Evaluation of adversarial generation methods against the Replay policy. Higher is better for Attack Success Rate and Adversarial Reward (↑), while lower is better for penalty metrics (↓).

Methods Attack Succ. Rate ↑ Adv. Reward ↑ Real. Pen. ↓ Map Comp. ↓ Dist. Diff. (WD)
Behav. Kine. Crash Obj. Cross Line Accel. Vel. Yaw
Rule 100.00% 5.048 2.798 5.614 1.734 7.724 2.080 8.546 0.204
CAT 94.85% 3.961 8.941 3.143 2.466 9.078 1.556 7.233 0.225
KING 40.85% 2.243 5.883 3.434 3.126 6.056 0.972 255.5 0.098
AdvTrajOpt 70.46% 2.652 4.500 2.775 2.547 10.16 1.754 6.177 0.268
SEAL 63.93% 1.269 3.017 2.423 2.732 11.612 1.544 6.959 0.202
SAGE (wadv=0.0) 16.26% 1.065 0.332 1.998 0.677 0.948 1.459 9.313 0.054
SAGE (wadv=0.5) 50.41% 2.523 0.483 2.064 0.755 0.949 1.521 8.471 0.079
SAGE (wadv=1.0) 76.15% 4.121 1.429 2.479 0.731 1.084 2.098 8.088 0.184


Testing Results: RL Policy

Evaluation of adversarial generation methods against the RL policy.

Methods Attack Succ. Rate ↑ Adv. Reward ↑ Real. Pen. ↓ Map Comp. ↓ Dist. Diff. (WD)
Behav. Kine. Crash Obj. Cross Line Accel. Vel. Yaw
Rule 65.57% 2.761 2.180 113.7 1.803 6.148 10.85 13.47 0.336
CAT 30.33% 1.319 8.191 3.039 2.623 6.967 1.539 8.877 0.187
KING 19.14% 1.148 2.041 2.596 3.114 5.857 0.983 259.1 0.097
AdvTrajOpt 19.40% 0.992 4.542 2.779 2.459 9.973 1.749 6.187 0.269
SEAL 31.40% 0.752 5.871 2.684 3.030 11.98 1.563 8.267 0.267
SAGE (wadv=0.0) 11.20% 0.722 0.332 2.000 0.738 0.956 1.456 9.344 0.055
SAGE (wadv=0.5) 13.66% 0.819 0.496 2.066 0.820 0.820 1.515 8.475 0.080
SAGE (wadv=1.0) 28.42% 1.400 1.468 2.496 0.792 1.366 2.098 8.114 0.188

Closed-loop RL Training

Evaluation of Trained RL Policies in the Log-replay (Normal, WOMD) Environments.

Methods Reward ↑ Cost ↓ Compl. ↑ Coll. ↓ Ave. Speed ↑ Ave. Jerk ↓
SAGE 51.99 ± 1.22 0.48 ± 0.05 0.77 ± 0.02 0.16 ± 0.05 9.27 ± 0.03 24.97 ± 0.53
CAT 46.81 ± 4.33 0.50 ± 0.05 0.67 ± 0.02 0.18 ± 0.05 7.21 ± 0.05 28.15 ± 1.06
Replay (No Adv) 50.16 ± 5.32 0.50 ± 0.07 0.72 ± 0.04 0.23 ± 0.02 9.03 ± 0.03 27.53 ± 0.98
Rule-based Adv 44.61 ± 3.88 0.52 ± 0.05 0.63 ± 0.04 0.13 ± 0.00 6.00 ± 0.10 28.22 ± 1.44

Qualitative Comparisons

RAW

CAT

Rule

SAGE

RAW

CAT

Rule

SAGE

BibTeX

@article{nie2025steerable,
  title={Steerable Adversarial Scenario Generation through Test-Time Preference Alignment},
  author={Nie, Tong and Mei, Yuewen and Tang, Yihong and He, Junlin and Sun, Jie and Shi, Haotian and Ma, Wei and Sun, Jian},
  journal={arXiv preprint arXiv:2509.20102},
  year={2025}
}