From that, they built an AI reward model with comparison dat