arXiv Analytics

Sign in

arXiv:2310.06648 [cs.LG]AbstractReferencesReviewsResources

Diversity from Human Feedback

Ren-Jian Wang, Ke Xue, Yutong Wang, Peng Yang, Haobo Fu, Qiang Fu, Chao Qian

Published 2023-10-10, updated 2023-12-10Version 2

Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

Related articles: Most relevant | Search more
arXiv:2310.13639 [cs.LG] (Published 2023-10-20)
Contrastive Prefence Learning: Learning from Human Feedback without RL
arXiv:2402.09401 [cs.LG] (Published 2024-02-14, updated 2025-02-11)
Reinforcement Learning from Human Feedback with Active Queries
arXiv:2107.01969 [cs.LG] (Published 2021-07-05)
The MineRL BASALT Competition on Learning from Human Feedback
Rohin Shah et al.