Artificial Intelligence #artificial intelligence#ai safety
Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning Models
A new method called Safe Trigger leverages the latent safety awareness of Large Reasoning Models to improve safety alignment without external data. Using Supervised Fine-Tuning and Direct Preference Optimization, the approach reduces Attack Success Rate on harmful and jailbreak benchmarks while preserving general performance.
Jun 16, 2026 1 source