arXiv:2209.02142 [astro-ph.GA]AbstractReferencesReviewsResources
Harvesting the Lyα forest with convolutional neural networks
Ting-Yun Cheng, Ryan Cooke, Gwen Rudie
Published 2022-09-05Version 1
We develop a machine learning based algorithm using a convolutional neural network (CNN) to identify low HI column density Ly$\alpha$ absorption systems ($\log{N_{\mathrm{HI}}}/{\rm cm}^{-2}<17$) in the Ly$\alpha$ forest, and predict their physical properties, such as their HI column density ($\log{N}_{\mathrm{HI}}/{\rm cm}^{-2}$), redshift ($z_{\mathrm{HI}}$), and Doppler width ($b_{\mathrm{HI}}$). Our CNN models are trained using simulated spectra (S/N $\simeq10$), and we test their performance on high quality spectra of quasars at redshift $z\sim2.5-2.9$ observed with the High Resolution Echelle Spectrometer on the Keck I telescope. We find that $\sim78\%$ of the systems identified by our algorithm are listed in the manual Voigt profile fitting catalogue. We demonstrate that the performance of our CNN is stable and consistent for all simulated and observed spectra with S/N $\gtrsim10$. Our model can therefore be consistently used to analyse the enormous number of both low and high S/N data available with current and future facilities. Our CNN provides state-of-the-art predictions within the range $12.5\leq\log{N_{\mathrm{HI}}}/\mathrm{cm^{-2}}<15.5$ with a mean absolute error of $\Delta(\log{N}_{\mathrm{HI}}/{\rm cm}^{-2})=0.13$, $\Delta(z_{\mathrm{HI}})=2.7\times{10}^{-5}$, and $\Delta(b_{\mathrm{HI}})=4.1\ \mathrm{km\ s^{-1}}$. The CNN prediction costs $<3$ minutes per model per spectrum with a size of 120\,000 pixels using a laptop computer. We demonstrate that CNNs can significantly increase the efficiency of analysing Ly$\alpha$ forest spectra, and thereby greatly increase the statistics of Ly$\alpha$ absorbers.