arXiv Analytics

Sign in

arXiv:2305.18601 [cs.CV]AbstractReferencesReviewsResources

BRIGHT: Bi-level Feature Representation of Image Collections using Groups of Hash Tables

Dingdong Yang, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang

Published 2023-05-29Version 1

We present BRIGHT, a bi-levelfeature representation for an imagecollection, consisting of a per-image latent space on top of a multi-scale feature grid space. Our representation is learned by an autoencoder to encode images intocontinuouskey codes, which are used to retrieve features fromgroups of multi-resolution hashtables. Our key codes and hash tables are trained together continuously with well-defined gradient flows, leading to high usage of the hash table entries and improved generative modeling compared to discrete Vector Quantization (VQ). Differently from existing continuous representations such as KL-regularized latent codes, our key codes are strictly bounded in scale and variance. Overall, feature encoding by BRIGHT is compact, efficient to train, and enables generative modeling over the image codes using state-of-the-art generators such as latent diffusion models(LDMs). Experimental results show that our method achieves comparable recon-struction results to VQ methods while having a smaller and more efficient decoder network. By applying LDM over our key code space, we achieve state-of-the-art performance on image synthesis on the LSUN-Church and human-face datasets.

Related articles: Most relevant | Search more
arXiv:1904.12936 [cs.CV] (Published 2019-04-29)
Learning to Find Common Objects Across Image Collections
arXiv:1808.05732 [cs.CV] (Published 2018-08-17)
Medical Image Imputation from Image Collections
arXiv:1811.10519 [cs.CV] (Published 2018-11-26)
Unsupervised 3D Shape Learning from Image Collections in the Wild