arXiv:2407.18990 Abstract | arXiv Analytics

arXiv:2407.18990 [cs.LG]Abstract References Reviews Resources

Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications

Alon Halfon, Shai Gretz, Ofir Arviv, Artem Spector, Orith Toledo-Ronen, Yoav Katz, Liat Ein-Dor, Michal Shmueli-Scheuer, Noam Slonim

Published 2024-07-25Version 1

Fine-tuning Large Language Models (LLMs) is an effective method to enhance their performance on downstream tasks. However, choosing the appropriate setting of tuning hyperparameters (HPs) is a labor-intensive and computationally expensive process. Here, we provide recommended HP configurations for practical use-cases that represent a better starting point for practitioners, when considering two SOTA LLMs and two commonly used tuning methods. We describe Coverage-based Search (CBS), a process for ranking HP configurations based on an offline extensive grid search, such that the top ranked configurations collectively provide a practical robust recommendation for a wide range of datasets and domains. We focus our experiments on Llama-3-8B and Mistral-7B, as well as full fine-tuning and LoRa, conducting a total of > 10,000 tuning experiments. Our results suggest that, in general, Llama-3-8B and LoRA should be preferred, when possible. Moreover, we show that for both models and tuning methods, exploring only a few HP configurations, as recommended by our analysis, can provide excellent results in practice, making this work a valuable resource for practitioners.

Categories: cs.LG, cs.AI, cs.CL

Keywords: real-world applications, empirical study, llm tuning, hyperparameters, tuning methods

Related articles: Most relevant | Search more

arXiv:2308.01923 [cs.LG] (Published 2023-07-25)

An Empirical Study on Fairness Improvement with Multiple Protected Attributes

Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman

arXiv:2002.09779 [cs.LG] (Published 2020-02-22)

Stochasticity in Neural ODEs: An Empirical Study

Viktor Oganesyan, Alexandra Volokhova, Dmitry Vetrov

arXiv:2311.05790 [cs.LG] (Published 2023-11-09)

The Paradox of Noise: An Empirical Study of Noise-Infusion Mechanisms to Improve Generalization, Stability, and Privacy in Federated Learning

Elaheh Jafarigol, Theodore Trafalis