arXiv:2310.06648 Abstract | arXiv Analytics

arXiv:2310.06648 [cs.LG]Abstract References Reviews Resources

Diversity from Human Feedback

Ren-Jian Wang, Ke Xue, Yutong Wang, Peng Yang, Haobo Fu, Qiang Fu, Chao Qian

Published 2023-10-10, updated 2023-12-10Version 2

Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

Categories: cs.LG, cs.AI, cs.NE

Keywords: human feedback, diversity measure, quality-diversity optimization algorithm map-elites, human preference, divhf learns

Related articles: Most relevant | Search more

arXiv:2310.13639 [cs.LG] (Published 2023-10-20)

Contrastive Prefence Learning: Learning from Human Feedback without RL

Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

arXiv:2402.09401 [cs.LG] (Published 2024-02-14, updated 2025-02-11)

Reinforcement Learning from Human Feedback with Active Queries

Kaixuan Ji, Jiafan He, Quanquan Gu

arXiv:2107.01969 [cs.LG] (Published 2021-07-05)

The MineRL BASALT Competition on Learning from Human Feedback

Rohin Shah et al.

arXiv Analytics

arXiv:2310.06648 [cs.LG]Abstract References Reviews Resources

Diversity from Human Feedback

Links

Toolbox

arXiv:2310.06648 [cs.LG]AbstractReferencesReviewsResources

Diversity from Human Feedback

Links

Toolbox

arXiv:2310.06648 [cs.LG]Abstract References Reviews Resources