arXiv:2311.01329 Abstract | arXiv Analytics

arXiv:2311.01329 [cs.LG]Abstract References Reviews Resources

A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

Published 2023-11-02Version 1

Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectories or segments of expert behavior in the task-agnostic data, a common assumption in prior work. In experiments across multiple testbeds, we find TAILO to be more robust and effective, particularly with incomplete trajectories.

Comments: 35 pages; Accepted as a poster for NeurIPS2023

Categories: cs.LG, cs.AI

Keywords: trajectory, offline imitation, possibly incomplete trajectories, simple solution, observations

Related articles: Most relevant | Search more

arXiv:2106.04219 [cs.LG] (Published 2021-06-08)

Time-series Imputation of Temporally-occluded Multiagent Trajectories

Shayegan Omidshafiei et al.

arXiv:2403.11418 [cs.LG] (Published 2024-03-18)

Variational Sampling of Temporal Trajectories

Jurijs Nazarovs, Zhichun Huang, Xingjian Zhen, Sourav Pal, Rudrasis Chakraborty, Vikas Singh

arXiv:2406.02295 [cs.LG] (Published 2024-06-04)

How to Explore with Belief: State Entropy Maximization in POMDPs

Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti

arXiv Analytics

arXiv:2311.01329 [cs.LG]Abstract References Reviews Resources

A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Links

Toolbox

arXiv:2311.01329 [cs.LG]AbstractReferencesReviewsResources

A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Links

Toolbox

arXiv:2311.01329 [cs.LG]Abstract References Reviews Resources