arXiv:2109.08668 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords language modeling, efficient transformers, training cost means primer needs, one-shot performance, 9b parameter configuration similar Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset