{ "id": "2310.06648", "version": "v2", "published": "2023-10-10T14:13:59.000Z", "updated": "2023-12-10T13:58:34.000Z", "title": "Diversity from Human Feedback", "authors": [ "Ren-Jian Wang", "Ke Xue", "Yutong Wang", "Peng Yang", "Haobo Fu", "Qiang Fu", "Chao Qian" ], "categories": [ "cs.LG", "cs.AI", "cs.NE" ], "abstract": "Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.", "revisions": [ { "version": "v2", "updated": "2023-12-10T13:58:34.000Z" } ], "analyses": { "keywords": [ "human feedback", "diversity measure", "quality-diversity optimization algorithm map-elites", "human preference", "divhf learns" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }