arXiv Analytics

Sign in

arXiv:2002.06723 [cs.LG]AbstractReferencesReviewsResources

Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning

Zhenyu Shou, Xuan Di

Published 2020-02-17Version 1

A large portion of the passenger requests is reportedly unserviced, partially due to vacant for-hire drivers' cruising behavior during the passenger seeking process. This paper aims to model the multi-driver repositioning task through a mean field multi-agent reinforcement learning (MARL) approach. Noticing that the direct application of MARL to the multi-driver system under a given reward mechanism will very likely yield a suboptimal equilibrium due to the selfishness of drivers, this study proposes a reward design scheme with which a more desired equilibrium can be reached. To effectively solve the bilevel optimization problem with upper level as the reward design and the lower level as a multi-agent system (MAS), a Bayesian optimization algorithm is adopted to speed up the learning process. We then use a synthetic dataset to test the proposed model. The results show that the weighted average of order response rate and overall service charge can be improved by 4% using a simple platform service charge, compared with that of no reward design.

Related articles: Most relevant | Search more
arXiv:2210.01063 [cs.LG] (Published 2022-10-03)
On Stability and Generalization of Bilevel Optimization Problem
arXiv:2311.07025 [cs.LG] (Published 2023-11-13)
Embarassingly Simple Dataset Distillation
arXiv:2211.04088 [cs.LG] (Published 2022-11-08)
A Penalty Based Method for Communication-Efficient Decentralized Bilevel Programming