Artificial Intelligence #artificial intelligence#machine learning
S-SPPO: Semantic Calibration Boosts LLM Preference Alignment Without Human Data
S-SPPO, a dual-space semantic calibration framework, fixes instability in Self-Play Preference Optimization (SPPO) for large language models. By annealing win targets and enforcing geometric diversity, it achieves superior alignment results on AlpacaEval 2.0 without extra human preferences.
Jun 17, 2026 1 source