{
  "id": "2305.16704",
  "version": "v1",
  "published": "2023-05-26T07:47:21.000Z",
  "updated": "2023-05-26T07:47:21.000Z",
  "title": "A Closer Look at In-Context Learning under Distribution Shifts",
  "authors": [
    "Kartik Ahuja",
    "David Lopez-Paz"
  ],
  "categories": [
    "cs.LG",
    "stat.ML"
  ],
  "abstract": "In-context learning, a capability that enables a model to learn from input examples on the fly without necessitating weight updates, is a defining characteristic of large language models. In this work, we follow the setting proposed in (Garg et al., 2022) to better understand the generality and limitations of in-context learning from the lens of the simple yet fundamental task of linear regression. The key question we aim to address is: Are transformers more adept than some natural and simpler architectures at performing in-context learning under varying distribution shifts? To compare transformers, we propose to use a simple architecture based on set-based Multi-Layer Perceptrons (MLPs). We find that both transformers and set-based MLPs exhibit in-context learning under in-distribution evaluations, but transformers more closely emulate the performance of ordinary least squares (OLS). Transformers also display better resilience to mild distribution shifts, where set-based MLPs falter. However, under severe distribution shifts, both models' in-context learning abilities diminish.",
  "revisions": [
    {
      "version": "v1",
      "updated": "2023-05-26T07:47:21.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "in-context learning",
      "closer look",
      "transformers",
      "large language models",
      "set-based mlps"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}