Artificial Intelligence #ai#safety
DOG-DPO: Training-Free Geometric Data Selection Boosts LLM Safety Alignment with 11% of Data
Researchers propose DOG-DPO, a training-free data selection framework for LLM safety alignment that treats preference pairs as geometric directions. By decomposing multi-dataset geometry and maximizing diversity-based coverage, it achieves strong utility-robustness trade-off using only 11% of preference pairs, recovering most safety gains of full-data training while being teacher-free, training-free, and substantially faster than traditional selection methods.
Jun 16, 2026 1 source