arXiv Analytics

Sign in

Search ResultsShowing 1-10 of 10

Sort by
  1. arXiv:2411.01469 (Published 2024-11-03)

    Exploring PCA-based feature representations of image pixels via CNN to enhance food image segmentation

    Ying Dai

    For open vocabulary recognition of ingredients in food images, segmenting the ingredients is a crucial step. This paper proposes a novel approach that explores PCA-based feature representations of image pixels using a convolutional neural network (CNN) to enhance segmentation. An internal clustering metric based on the silhouette score is defined to evaluate the clustering quality of various pixel-level feature representations generated by different feature maps derived from various CNN backbones. Using this metric, the paper explores optimal feature representation selection and suitable clustering methods for ingredient segmentation. Additionally, it is found that principal component (PC) maps derived from concatenations of backbone feature maps improve the clustering quality of pixel-level feature representations, resulting in stable segmentation outcomes. Notably, the number of selected eigenvalues can be used as the number of clusters to achieve good segmentation results. The proposed method performs well on the ingredient-labeled dataset FoodSeg103, achieving a mean Intersection over Union (mIoU) score of 0.5423. Importantly, the proposed method is unsupervised, and pixel-level feature representations from backbones are not fine-tuned on specific datasets. This demonstrates the flexibility, generalizability, and interpretability of the proposed method, while reducing the need for extensive labeled datasets.

  2. arXiv:2209.12391 (Published 2022-09-26)

    FastStamp: Accelerating Neural Steganography and Digital Watermarking of Images on FPGAs

    Shehzeen Hussain, Nojan Sheybani, Paarth Neekhara, Xinqiao Zhang, Javier Duarte, Farinaz Koushanfar

    Steganography and digital watermarking are the tasks of hiding recoverable data in image pixels. Deep neural network (DNN) based image steganography and watermarking techniques are quickly replacing traditional hand-engineered pipelines. DNN based watermarking techniques have drastically improved the message capacity, imperceptibility and robustness of the embedded watermarks. However, this improvement comes at the cost of increased computational overhead of the watermark encoder neural network. In this work, we design the first accelerator platform FastStamp to perform DNN based steganography and digital watermarking of images on hardware. We first propose a parameter efficient DNN model for embedding recoverable bit-strings in image pixels. Our proposed model can match the success metrics of prior state-of-the-art DNN based watermarking methods while being significantly faster and lighter in terms of memory footprint. We then design an FPGA based accelerator framework to further improve the model throughput and power consumption by leveraging data parallelism and customized computation paths. FastStamp allows embedding hardware signatures into images to establish media authenticity and ownership of digital media. Our best design achieves 68 times faster inference as compared to GPU implementations of prior DNN based watermark encoder while consuming less power.

  3. arXiv:2107.13757 (Published 2021-07-29, updated 2022-04-10)

    Bridging Gap between Image Pixels and Semantics via Supervision: A Survey

    Jiali Duan, C. -C. Jay Kuo
    Comments: Jiali Duan and C.-C. Jay Kuo (2022), "Bridging Gap between Image Pixels and Semantics via Supervision: A Survey", APSIPA Transactions on Signal and Information Processing: Vol. 11: No. 1, e2. http://dx.doi.org/10.1561/116.00000038
    Categories: cs.CV

    The fact that there exists a gap between low-level features and semantic meanings of images, called the semantic gap, is known for decades. Resolution of the semantic gap is a long standing problem. The semantic gap problem is reviewed and a survey on recent efforts in bridging the gap is made in this work. Most importantly, we claim that the semantic gap is primarily bridged through supervised learning today. Experiences are drawn from two application domains to illustrate this point: 1) object detection and 2) metric learning for content-based image retrieval (CBIR). To begin with, this paper offers a historical retrospective on supervision, makes a gradual transition to the modern data-driven methodology and introduces commonly used datasets. Then, it summarizes various supervision methods to bridge the semantic gap in the context of object detection and metric learning.

  4. arXiv:2009.04004 (Published 2020-09-08)

    Fuzzy Unique Image Transformation: Defense Against Adversarial Attacks On Deep COVID-19 Models

    Achyut Mani Tripathi, Ashish Mishra

    Early identification of COVID-19 using a deep model trained on Chest X-Ray and CT images has gained considerable attention from researchers to speed up the process of identification of active COVID-19 cases. These deep models act as an aid to hospitals that suffer from the unavailability of specialists or radiologists, specifically in remote areas. Various deep models have been proposed to detect the COVID-19 cases, but few works have been performed to prevent the deep models against adversarial attacks capable of fooling the deep model by using a small perturbation in image pixels. This paper presents an evaluation of the performance of deep COVID-19 models against adversarial attacks. Also, it proposes an efficient yet effective Fuzzy Unique Image Transformation (FUIT) technique that downsamples the image pixels into an interval. The images obtained after the FUIT transformation are further utilized for training the secure deep model that preserves high accuracy of the diagnosis of COVID-19 cases and provides reliable defense against the adversarial attacks. The experiments and results show the proposed model prevents the deep model against the six adversarial attacks and maintains high accuracy to classify the COVID-19 cases from the Chest X-Ray image and CT image Datasets. The results also recommend that a careful inspection is required before practically applying the deep models to diagnose the COVID-19 cases.

  5. arXiv:1806.09170 (Published 2018-06-24)

    Fusion of complex networks and randomized neural networks for texture analysis

    Lucas C. Ribas, Jarbas J. M. Sa Junior, Leonardo F. S. Scabini, Odemir M. Bruno

    This paper presents a high discriminative texture analysis method based on the fusion of complex networks and randomized neural networks. In this approach, the input image is modeled as a complex networks and its topological properties as well as the image pixels are used to train randomized neural networks in order to create a signature that represents the deep characteristics of the texture. The results obtained surpassed the accuracies of many methods available in the literature. This performance demonstrates that our proposed approach opens a promising source of research, which consists of exploring the synergy of neural networks and complex networks in the texture analysis field.

  6. arXiv:1804.02771 (Published 2018-04-08, updated 2018-12-09)

    Estimating Depth from RGB and Sparse Sensing

    Zhao Chen, Vijay Badrinarayanan, Gilad Drozdov, Andrew Rabinovich
    Comments: European Conference on Computer Vision (ECCV) 2018. Updated to camera-ready version with additional experiments
    Journal: In: European Conference on Computer Vision. pp. 176-192. Springer (2018)
    Categories: cs.CV

    We present a deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels. The model works simultaneously for both indoor/outdoor scenes and produces state-of-the-art dense depth maps at nearly real-time speeds on both the NYUv2 and KITTI datasets. We surpass the state-of-the-art for monocular depth estimation even with depth values for only 1 out of every ~10000 image pixels, and we outperform other sparse-to-dense depth methods at all sparsity levels. With depth values for 1/256 of the image pixels, we achieve a mean absolute error of less than 1% of actual depth on indoor scenes, comparable to the performance of consumer-grade depth sensor hardware. Our experiments demonstrate that it would indeed be possible to efficiently transform sparse depth measurements obtained using e.g. lower-power depth sensors or SLAM systems into high-quality dense depth maps.

  7. arXiv:1704.08944 (Published 2017-04-28)

    Object Discovery via Cohesion Measurement

    Guanjun Guo, Hanzi Wang, Wan-Lei Zhao, Yan Yan, Xuelong Li
    Comments: 14 pages, 14 figures
    Journal: IEEE Transactions on Cybernetics (2017) 1-14
    Categories: cs.CV

    Color and intensity are two important components in an image. Usually, groups of image pixels, which are similar in color or intensity, are an informative representation for an object. They are therefore particularly suitable for computer vision tasks, such as saliency detection and object proposal generation. However, image pixels, which share a similar real-world color, may be quite different since colors are often distorted by intensity. In this paper, we reinvestigate the affinity matrices originally used in image segmentation methods based on spectral clustering. A new affinity matrix, which is robust to color distortions, is formulated for object discovery. Moreover, a Cohesion Measurement (CM) for object regions is also derived based on the formulated affinity matrix. Based on the new Cohesion Measurement, a novel object discovery method is proposed to discover objects latent in an image by utilizing the eigenvectors of the affinity matrix. Then we apply the proposed method to both saliency detection and object proposal generation. Experimental results on several evaluation benchmarks demonstrate that the proposed CM based method has achieved promising performance for these two tasks.

  8. arXiv:1510.02173 (Published 2015-10-08)

    Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models

    John-Alexander M. Assael, Niklas Wahlström, Thomas B. Schön, Marc Peter Deisenroth
    Comments: arXiv admin note: text overlap with arXiv:1502.02251
    Categories: cs.AI, cs.CV, cs.LG, stat.ML

    Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy ("torques") from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces, is lightweight and an important step toward fully autonomous end-to-end learning from pixels to torques.

  9. arXiv:1210.3832 (Published 2012-10-14)

    Image Processing using Smooth Ordering of its Patches

    Idan Ram, Michael Elad, Israel Cohen
    Comments: 8 pages, 7 figures, 4 tables, submitted to IEEE Transactions on Image Processing
    Categories: cs.CV

    We propose an image processing scheme based on reordering of its patches. For a given corrupted image, we extract all patches with overlaps, refer to these as coordinates in high-dimensional space, and order them such that they are chained in the "shortest possible path", essentially solving the traveling salesman problem. The obtained ordering applied to the corrupted image, implies a permutation of the image pixels to what should be a regular signal. This enables us to obtain good recovery of the clean image by applying relatively simple 1D smoothing operations (such as filtering or interpolation) to the reordered set of pixels. We explore the use of the proposed approach to image denoising and inpainting, and show promising results in both cases.

  10. arXiv:cs/0601102 (Published 2006-01-24)

    Geometric symmetry in the quadratic Fisher discriminant operating on image pixels

    Robert S. Caprari
    Comments: Accepted for publication in IEEE Transactions on Information Theory
    Journal: IEEE Transactions on Information Theory 52(4), April 2006, pp. 1780-1788
    Categories: cs.IT, cs.CV, math.IT

    This article examines the design of Quadratic Fisher Discriminants (QFDs) that operate directly on image pixels, when image ensembles are taken to comprise all rotated and reflected versions of distinct sample images. A procedure based on group theory is devised to identify and discard QFD coefficients made redundant by symmetry, for arbitrary sampling lattices. This procedure introduces the concept of a degeneracy matrix. Tensor representations are established for the square lattice point group (8-fold symmetry) and hexagonal lattice point group (12-fold symmetry). The analysis is largely applicable to the symmetrisation of any quadratic filter, and generalises to higher order polynomial (Volterra) filters. Experiments on square lattice sampled synthetic aperture radar (SAR) imagery verify that symmetrisation of QFDs can improve their generalisation and discrimination ability.