arXiv:2406.09068 Abstract | arXiv Analytics

arXiv:2406.09068 [cs.LG]Abstract References Reviews Resources

Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Claude Formanek, Callum Rhys Tilbury, Louise Beyers, Jonathan Shock, Arnu Pretorius

Published 2024-06-13Version 1

Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work. In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work. Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks. Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75% of cases), we match or surpass the performance of the current purported SOTA. Strikingly, our baselines often substantially outperform these more sophisticated algorithms. Finally, we correct for the shortcomings highlighted from this prior work by introducing a straightforward standardised methodology for evaluation and by providing our baseline implementations with statistically robust results across several scenarios, useful for comparisons in future work. Our proposal includes simple and sensible steps that are easy to adopt, which in combination with solid baselines and comparative results, could substantially improve the overall rigour of empirical science in offline MARL moving forward.

Categories: cs.LG, cs.AI

Keywords: prior work, standardised baselines, evaluation, published offline marl work, accurately assess progress

Related articles: Most relevant | Search more

arXiv:2206.04921 [cs.LG] (Published 2022-06-10)

Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality

Ming Yin, Wenjing Chen, Mengdi Wang, Yu-Xiang Wang

arXiv:1505.00401 [cs.LG] (Published 2015-05-03)

Visualization of Tradeoff in Evaluation: from Precision-Recall & PN to LIFT, ROC & BIRD

David M. W. Powers

arXiv:cs/0212014 [cs.LG] (Published 2002-12-08)

Extraction of Keyphrases from Text: Evaluation of Four Algorithms

Peter D. Turney

arXiv Analytics

arXiv:2406.09068 [cs.LG]Abstract References Reviews Resources

Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Links

Toolbox

arXiv:2406.09068 [cs.LG]AbstractReferencesReviewsResources

Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Links

Toolbox

arXiv:2406.09068 [cs.LG]Abstract References Reviews Resources