{ "id": "2410.17835", "version": "v1", "published": "2024-10-23T12:54:04.000Z", "updated": "2024-10-23T12:54:04.000Z", "title": "Optimal Streaming Algorithms for Multi-Armed Bandits", "authors": [ "Tianyuan Jin", "Keke Huang", "Jing Tang", "Xiaokui Xiao" ], "comment": "24pages", "journal": "ICML2021", "categories": [ "cs.LG" ], "abstract": "This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of $n$ arms with reward distributions supported on $[0,1]$ with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \\eps-$top$-$k$ arms identification problem, which asks for $k$ arms whose reward means are lower than that of the $k$-th best arm by at most $\\eps$ with probability at least $1-\\delta$. For general $\\eps \\in (0,1)$, the existing solution for this problem assumes $k = 1$ and achieves the optimal sample complexity $O(\\frac{n}{\\eps^2} \\log \\frac{1}{\\delta})$ using $O(\\log^*(n))$ ($\\log^*(n)$ equals the number of times that we need to apply the logarithm function on $n$ before the results is no more than 1.) memory and a single pass of the stream. We propose an algorithm that works for any $k$ and achieves the optimal sample complexity $O(\\frac{n}{\\eps^2} \\log\\frac{k}{\\delta})$ using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least $1-\\delta$ probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within $O(\\log \\Delta_2^{-1})$ passes, where $\\Delta_2$ is the gap between the mean of the best arm and that of the second best arm.", "revisions": [ { "version": "v1", "updated": "2024-10-23T12:54:04.000Z" } ], "analyses": { "keywords": [ "optimal streaming algorithms", "best arm", "multi-armed bandits", "single pass", "instance-dependent optimal sample complexity" ], "tags": [ "journal article" ], "note": { "typesetting": "TeX", "pages": 24, "language": "en", "license": "arXiv", "status": "editable" } } }