arXiv:2002.04745 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords transformer architecture, pre-ln transformer, residual blocks, natural language processing tasks, mean field theory Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset