{
  "id": "2410.17835",
  "version": "v1",
  "published": "2024-10-23T12:54:04.000Z",
  "updated": "2024-10-23T12:54:04.000Z",
  "title": "Optimal Streaming Algorithms for Multi-Armed Bandits",
  "authors": [
    "Tianyuan Jin",
    "Keke Huang",
    "Jing Tang",
    "Xiaokui Xiao"
  ],
  "comment": "24pages",
  "journal": "ICML2021",
  "categories": [
    "cs.LG"
  ],
  "abstract": "This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of $n$ arms with reward distributions supported on $[0,1]$ with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \\eps-$top$-$k$ arms identification problem, which asks for $k$ arms whose reward means are lower than that of the $k$-th best arm by at most $\\eps$ with probability at least $1-\\delta$. For general $\\eps \\in (0,1)$, the existing solution for this problem assumes $k = 1$ and achieves the optimal sample complexity $O(\\frac{n}{\\eps^2} \\log \\frac{1}{\\delta})$ using $O(\\log^*(n))$ ($\\log^*(n)$ equals the number of times that we need to apply the logarithm function on $n$ before the results is no more than 1.) memory and a single pass of the stream. We propose an algorithm that works for any $k$ and achieves the optimal sample complexity $O(\\frac{n}{\\eps^2} \\log\\frac{k}{\\delta})$ using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least $1-\\delta$ probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within $O(\\log \\Delta_2^{-1})$ passes, where $\\Delta_2$ is the gap between the mean of the best arm and that of the second best arm.",
  "revisions": [
    {
      "version": "v1",
      "updated": "2024-10-23T12:54:04.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "optimal streaming algorithms",
      "best arm",
      "multi-armed bandits",
      "single pass",
      "instance-dependent optimal sample complexity"
    ],
    "tags": [
      "journal article"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 24,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}