{ "id": "2112.09865", "version": "v2", "published": "2021-12-18T07:38:24.000Z", "updated": "2024-08-18T05:51:04.000Z", "title": "Off-Policy Evaluation Using Information Borrowing and Context-Based Switching", "authors": [ "Sutanoy Dasgupta", "Yabo Niu", "Kishan Panaganti", "Dileep Kalathil", "Debdeep Pati", "Bani Mallick" ], "comment": "23 pages, 6 figures, manuscript under review", "categories": [ "stat.ML", "cs.LG" ], "abstract": "We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algorithms primarily focus on strategies to reduce the variance of the DR estimator arising from large IPS. We propose a new approach called the Doubly Robust with Information borrowing and Context-based switching (DR-IC) estimator that focuses on reducing both bias and variance. The DR-IC estimator replaces the standard DM estimator with a parametric reward model that borrows information from the 'closer' contexts through a correlation structure that depends on the IPS. The DR-IC estimator also adaptively interpolates between this modified DM estimator and a modified DR estimator based on a context-specific switching rule. We give provable guarantees on the performance of the DR-IC estimator. We also demonstrate the superior performance of the DR-IC estimator compared to the state-of-the-art OPE algorithms on a number of benchmark problems.", "revisions": [ { "version": "v2", "updated": "2024-08-18T05:51:04.000Z" } ], "analyses": { "keywords": [ "off-policy evaluation", "information borrowing", "context-based switching", "dr estimator", "state-of-the-art ope algorithms" ], "note": { "typesetting": "TeX", "pages": 23, "language": "en", "license": "arXiv", "status": "editable" } } }