arXiv Analytics

Sign in

arXiv:2004.14878 [cs.CV]AbstractReferencesReviewsResources

PreCNet: Next Frame Video Prediction Based on Predictive Coding

Zdenek Straka, Tomas Svoboda, Matej Hoffmann

Published 2020-04-30Version 1

Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera. On this benchmark (training: 41k images from KITTI dataset; testing: Caltech Pedestrian dataset), we achieve to our knowledge the best performance to date when measured with the Structural Similarity Index (SSIM). On two other common measures, MSE and PSNR, the model ranked third and fourth, respectively. Performance was further improved when a larger training set (2M images from BDD100k), pointing to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based in a neuroscience model, without being explicitly tailored to the task at hand, can exhibit unprecedented performance.

Related articles: Most relevant | Search more
arXiv:1803.05268 [cs.CV] (Published 2018-03-14)
Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
arXiv:1802.02601 [cs.CV] (Published 2018-02-06)
Digital Watermarking for Deep Neural Networks
arXiv:1804.10743 [cs.CV] (Published 2018-04-28)
Precise Box Score: Extract More Information from Datasets to Improve the Performance of Face Detection