arXiv:2112.09865 Abstract | arXiv Analytics

arXiv:2112.09865 [stat.ML]Abstract References Reviews Resources

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

Sutanoy Dasgupta, Yabo Niu, Kishan Panaganti, Dileep Kalathil, Debdeep Pati, Bani Mallick

Published 2021-12-18, updated 2024-08-18Version 2

We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algorithms primarily focus on strategies to reduce the variance of the DR estimator arising from large IPS. We propose a new approach called the Doubly Robust with Information borrowing and Context-based switching (DR-IC) estimator that focuses on reducing both bias and variance. The DR-IC estimator replaces the standard DM estimator with a parametric reward model that borrows information from the 'closer' contexts through a correlation structure that depends on the IPS. The DR-IC estimator also adaptively interpolates between this modified DM estimator and a modified DR estimator based on a context-specific switching rule. We give provable guarantees on the performance of the DR-IC estimator. We also demonstrate the superior performance of the DR-IC estimator compared to the state-of-the-art OPE algorithms on a number of benchmark problems.

Comments: 23 pages, 6 figures, manuscript under review

Categories: stat.ML, cs.LG

Keywords: off-policy evaluation, information borrowing, context-based switching, dr estimator, state-of-the-art ope algorithms

Related articles: Most relevant | Search more

arXiv:2502.08993 [stat.ML] (Published 2025-02-13)

Off-Policy Evaluation for Recommendations with Missing-Not-At-Random Rewards

Tatsuki Takahashi, Chihiro Maru, Hiroko Shoji

arXiv:2006.06982 [stat.ML] (Published 2020-06-12)

Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales

Masahiro Kato

arXiv:2306.04836 [stat.ML] (Published 2023-06-07)

$K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control

Michael Giegrich, Roel Oomen, Christoph Reisinger

arXiv Analytics

arXiv:2112.09865 [stat.ML]Abstract References Reviews Resources

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

Links

Toolbox

arXiv:2112.09865 [stat.ML]AbstractReferencesReviewsResources

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

Links

Toolbox

arXiv:2112.09865 [stat.ML]Abstract References Reviews Resources