Artificial Intelligence #ai safety#llm
CHILLGuard: Fine-Grained Chinese LLM Safety Guardrail with Scalable Data and Preference Alignment
Researchers introduce CHILLGuard, a dedicated Chinese LLM content safety guardrail featuring a 5-macro, 31-micro category risk taxonomy. The system uses a scalable multi-stage data construction pipeline to create the CHILLGuardTrain dataset (405,007 samples) and achieves a 15.92% F1 score improvement over Qwen3Guard-8B-Strict via Model-aware Direct Preference Optimization.
Jun 16, 2026 1 source