arXiv Analytics

Sign in

Search ResultsShowing 1-13 of 13

Sort by
  1. arXiv:2408.10619 (Published 2024-08-20)

    Novel Change Detection Framework in Remote Sensing Imagery Using Diffusion Models and Structural Similarity Index (SSIM)

    Andrew Kiruluta, Eric Lundy, Andreas Lemos

    Change detection is a crucial task in remote sensing, enabling the monitoring of environmental changes, urban growth, and disaster impact. Conventional change detection techniques, such as image differencing and ratioing, often struggle with noise and fail to capture complex variations in imagery. Recent advancements in machine learning, particularly generative models like diffusion models, offer new opportunities for enhancing change detection accuracy. In this paper, we propose a novel change detection framework that combines the strengths of Stable Diffusion models with the Structural Similarity Index (SSIM) to create robust and interpretable change maps. Our approach, named Diffusion Based Change Detector, is evaluated on both synthetic and real-world remote sensing datasets and compared with state-of-the-art methods. The results demonstrate that our method significantly outperforms traditional differencing techniques and recent deep learning-based methods, particularly in scenarios with complex changes and noise.

  2. arXiv:2311.05548 (Published 2023-11-09)

    L-WaveBlock: A Novel Feature Extractor Leveraging Wavelets for Generative Adversarial Networks

    Mirat Shah, Vansh Jain, Anmol Chokshi, Guruprasad Parasnis, Pramod Bide

    Generative Adversarial Networks (GANs) have risen to prominence in the field of deep learning, facilitating the generation of realistic data from random noise. The effectiveness of GANs often depends on the quality of feature extraction, a critical aspect of their architecture. This paper introduces L-WaveBlock, a novel and robust feature extractor that leverages the capabilities of the Discrete Wavelet Transform (DWT) with deep learning methodologies. L-WaveBlock is catered to quicken the convergence of GAN generators while simultaneously enhancing their performance. The paper demonstrates the remarkable utility of L-WaveBlock across three datasets, a road satellite imagery dataset, the CelebA dataset and the GoPro dataset, showcasing its ability to ease feature extraction and make it more efficient. By utilizing DWT, L-WaveBlock efficiently captures the intricate details of both structural and textural details, and further partitions feature maps into orthogonal subbands across multiple scales while preserving essential information at the same time. Not only does it lead to faster convergence, but also gives competent results on every dataset by employing the L-WaveBlock. The proposed method achieves an Inception Score of 3.6959 and a Structural Similarity Index of 0.4261 on the maps dataset, a Peak Signal-to-Noise Ratio of 29.05 and a Structural Similarity Index of 0.874 on the CelebA dataset. The proposed method performs competently to the state-of-the-art for the image denoising dataset, albeit not better, but still leads to faster convergence than conventional methods. With this, L-WaveBlock emerges as a robust and efficient tool for enhancing GAN-based image generation, demonstrating superior convergence speed and competitive performance across multiple datasets for image resolution, image generation and image denoising.

  3. arXiv:2310.20083 (Published 2023-10-30)

    Facial asymmetry: A Computer Vision based behaviometric index for assessment during a face-to-face interview

    Shuvam Keshari, Tanusree Dutta, Raju Mullick, Ashish Rathor, Priyadarshi Patnaik

    Choosing the right person for the right job makes the personnel interview process a cognitively demanding task. Psychometric tests, followed by an interview, have often been used to aid the process although such mechanisms have their limitations. While psychometric tests suffer from faking or social desirability of responses, the interview process depends on the way the responses are analyzed by the interviewers. We propose the use of behaviometry as an assistive tool to facilitate an objective assessment of the interviewee without increasing the cognitive load of the interviewer. Behaviometry is a relatively little explored field of study in the selection process, that utilizes inimitable behavioral characteristics like facial expressions, vocalization patterns, pupillary reactions, proximal behavior, body language, etc. The method analyzes thin slices of behavior and provides unbiased information about the interviewee. The current study proposes the methodology behind this tool to capture facial expressions, in terms of facial asymmetry and micro-expressions. Hemi-facial composites using a structural similarity index was used to develop a progressive time graph of facial asymmetry, as a test case. A frame-by-frame analysis was performed on three YouTube video samples, where Structural similarity index (SSID) scores of 75% and more showed behavioral congruence. The research utilizes open-source computer vision algorithms and libraries (python-opencv and dlib) to formulate the procedure for analysis of the facial asymmetry.

  4. arXiv:2309.12506 (Published 2023-09-21)

    License Plate Super-Resolution Using Diffusion Models

    Sawsan AlHalawani, Bilel Benjdira, Adel Ammar, Anis Koubaa, Anas M. Ali

    In surveillance, accurately recognizing license plates is hindered by their often low quality and small dimensions, compromising recognition precision. Despite advancements in AI-based image super-resolution, methods like Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) still fall short in enhancing license plate images. This study leverages the cutting-edge diffusion model, which has consistently outperformed other deep learning techniques in image restoration. By training this model using a curated dataset of Saudi license plates, both in low and high resolutions, we discovered the diffusion model's superior efficacy. The method achieves a 12.55\% and 37.32% improvement in Peak Signal-to-Noise Ratio (PSNR) over SwinIR and ESRGAN, respectively. Moreover, our method surpasses these techniques in terms of Structural Similarity Index (SSIM), registering a 4.89% and 17.66% improvement over SwinIR and ESRGAN, respectively. Furthermore, 92% of human evaluators preferred our images over those from other algorithms. In essence, this research presents a pioneering solution for license plate super-resolution, with tangible potential for surveillance systems.

  5. arXiv:2305.11675 (Published 2023-05-19)

    Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity

    Zijiao Chen, Jiaxin Qing, Juan Helen Zhou
    Comments: 15 pages, 11 figures, submitted to anonymous conference
    Categories: cs.CV, cs.CE

    Reconstructing human vision from brain activities has been an appealing task that helps to understand our cognitive process. Even though recent research has seen great success in reconstructing static images from non-invasive brain recordings, work on recovering continuous visual experiences in the form of videos is limited. In this work, we propose Mind-Video that learns spatiotemporal information from continuous fMRI data of the cerebral cortex progressively through masked brain modeling, multimodal contrastive learning with spatiotemporal attention, and co-training with an augmented Stable Diffusion model that incorporates network temporal inflation. We show that high-quality videos of arbitrary frame rates can be reconstructed with Mind-Video using adversarial guidance. The recovered videos were evaluated with various semantic and pixel-level metrics. We achieved an average accuracy of 85% in semantic classification tasks and 0.19 in structural similarity index (SSIM), outperforming the previous state-of-the-art by 45%. We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.

  6. arXiv:2202.02616 (Published 2022-02-05)

    DSSIM: a structural similarity index for floating-point data

    Allison H. Baker, Alexander Pinard, Dorit M. Hammerling

    Data visualization is a critical component in terms of interacting with floating-point output data from large model simulation codes. Indeed, postprocessing analysis workflows on simulation data often generate a large number of images from the raw data, many of which are then compared to each other or to specified reference images. In this image-comparison scenario, image quality assessment (IQA) measures are quite useful, and the Structural Similarity Index (SSIM) continues to be a popular choice. However, generating large numbers of images can be costly, and plot-specific (but data independent) choices can affect the SSIM value. A natural question is whether we can apply the SSIM directly to the floating-point simulation data and obtain an indication of whether differences in the data are likely to impact a visual assessment, effectively bypassing the creation of a specific set of images from the data. To this end, we propose an alternative to the popular SSIM that can be applied directly to the floating point data, which we refer to as the Data SSIM (DSSIM). While we demonstrate the usefulness of the DSSIM in the context of evaluating differences due to lossy compression on large volumes of simulation data from a popular climate model, the DSSIM may prove useful for many other applications involving simulation or image data.

  7. arXiv:2011.07017 (Published 2020-11-13)

    NightVision: Generating Nighttime Satellite Imagery from Infra-Red Observations

    Paula Harder et al.

    The recent explosion in applications of machine learning to satellite imagery often rely on visible images and therefore suffer from a lack of data during the night. The gap can be filled by employing available infra-red observations to generate visible images. This work presents how deep learning can be applied successfully to create those images by using U-Net based architectures. The proposed methods show promising results, achieving a structural similarity index (SSIM) up to 86\% on an independent test set and providing visually convincing output images, generated from infra-red observations.

  8. arXiv:2004.14878 (Published 2020-04-30)

    PreCNet: Next Frame Video Prediction Based on Predictive Coding

    Zdenek Straka, Tomas Svoboda, Matej Hoffmann

    Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera. On this benchmark (training: 41k images from KITTI dataset; testing: Caltech Pedestrian dataset), we achieve to our knowledge the best performance to date when measured with the Structural Similarity Index (SSIM). On two other common measures, MSE and PSNR, the model ranked third and fourth, respectively. Performance was further improved when a larger training set (2M images from BDD100k), pointing to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based in a neuroscience model, without being explicitly tailored to the task at hand, can exhibit unprecedented performance.

  9. arXiv:2004.01864 (Published 2020-04-04)

    Theoretical Insights into the Use of Structural Similarity Index In Generative Models and Inferential Autoencoders

    Benyamin Ghojogh, Fakhri Karray, Mark Crowley
    Comments: Accepted (to appear) in International Conference on Image Analysis and Recognition (ICIAR) 2020, Springer
    Categories: cs.LG, cs.CV, eess.IV, stat.ML

    Generative models and inferential autoencoders mostly make use of $\ell_2$ norm in their optimization objectives. In order to generate perceptually better images, this short paper theoretically discusses how to use Structural Similarity Index (SSIM) in generative models and inferential autoencoders. We first review SSIM, SSIM distance metrics, and SSIM kernel. We show that the SSIM kernel is a universal kernel and thus can be used in unconditional and conditional generated moment matching networks. Then, we explain how to use SSIM distance in variational and adversarial autoencoders and unconditional and conditional Generative Adversarial Networks (GANs). Finally, we propose to use SSIM distance rather than $\ell_2$ norm in least squares GAN.

  10. arXiv:1908.09287 (Published 2019-08-25)

    Principal Component Analysis Using Structural Similarity Index for Images

    Benyamin Ghojogh, Fakhri Karray, Mark Crowley
    Comments: Paper for the methods named "Image Structural Component Analysis (ISCA)" and "Kernel Image Structural Component Analysis (Kernel ISCA)"
    Journal: International Conference on Image Analysis and Recognition, Springer, pp. 77-88, 2019
    Categories: eess.IV, cs.CV, cs.LG, stat.ML

    Despite the advances of deep learning in specific tasks using images, the principled assessment of image fidelity and similarity is still a critical ability to develop. As it has been shown that Mean Squared Error (MSE) is insufficient for this task, other measures have been developed with one of the most effective being Structural Similarity Index (SSIM). Such measures can be used for subspace learning but existing methods in machine learning, such as Principal Component Analysis (PCA), are based on Euclidean distance or MSE and thus cannot properly capture the structural features of images. In this paper, we define an image structure subspace which discriminates different types of image distortions. We propose Image Structural Component Analysis (ISCA) and also kernel ISCA by using SSIM, rather than Euclidean distance, in the formulation of PCA. This paper provides a bridge between image quality assessment and manifold learning opening a broad new area for future research.

  11. arXiv:1906.10411 (Published 2019-06-25)

    Deep Learning of Compressed Sensing Operators with Structural Similarity Loss

    Yochai Zur, Amir Adler

    Compressed sensing (CS) is a signal processing framework for efficiently reconstructing a signal from a small number of measurements, obtained by linear projections of the signal. In this paper we present an end-to-end deep learning approach for CS, in which a fully-connected network performs both the linear sensing and non-linear reconstruction stages. During the training phase, the sensing matrix and the non-linear reconstruction operator are jointly optimized using Structural similarity index (SSIM) as loss rather than the standard Mean Squared Error (MSE) loss. We compare the proposed approach with state-of-the-art in terms of reconstruction quality under both losses, i.e. SSIM score and MSE score.

  12. arXiv:1503.06680 (Published 2015-01-29)

    Structural Similarity Index SSIMplified: Is there really a simpler concept at the heart of image quality measurement?

    Kieran Gerard Larkin
    Comments: 4 pages total, main analysis 2 pages, notes and minimal references 1 page
    Categories: cs.CV

    The ubiquitous Structural Similarity Index or SSIM can be dramatically simplified, straightened and re-interpreted as a normalized visibility function, or dissimilarity quotient. Explicitly writing SSIM in a symmetric formulation immediately reveals the previously enigmatic structural covariance as a difference of variances. A dramatic simplification ensues. Although the SSIM was a milestone in the recent history of Image Quality Assessment (IQA) its immense success (in citation terms) has now become more of a millstone. Among thousands of interested researchers only a few cognoscenti seem to be aware that there is much simpler concept at the core of SSIM; a concept that is, arguably, the real reason for its explanatory power. But as the citation juggernaut rolls onward most researchers seem destined to implementing endless variations of the unintuitively complex SSIM formula. A more hard-nosed evaluation of SSIM might conclude that it is only an indirect way of using the simplest of all perceptually-masked image quality metrics (namely normalized RMSE), and that SSIM works coincidentally since the covariance term is actually the MSE in disguise. Perhaps the search for an ideal image quality index can now formally split into: 1. Intricate HVS informed models that closely correlate with human observers, 2. Simple, viewing condition invariant, low-computational models with tractable mathematical properties yet tolerable HVS correlation.

  13. arXiv:1212.5352 (Published 2012-12-21)

    On the Adaptability of Neural Network Image Super-Resolution

    Kah Keong Chua, Yong Haur Tay
    Comments: Image Super Resolution, Neural Network, Multilayer Perceptron, Mean Squared Error, Peak Signal-to-Noise Ratio, Structural Similarity Index
    Categories: cs.CV

    In this paper, we described and developed a framework for Multilayer Perceptron (MLP) to work on low level image processing, where MLP will be used to perform image super-resolution. Meanwhile, MLP are trained with different types of images from various categories, hence analyse the behaviour and performance of the neural network. The tests are carried out using qualitative test, in which Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM). The results showed that MLP trained with single image category can perform reasonably well compared to methods proposed by other researchers.