A new algorithm combines offline RL and preference feedback for improved decision-making.
― 9 min read
Cutting edge science explained simply
A new algorithm combines offline RL and preference feedback for improved decision-making.
― 9 min read