arXiv:2410.11650 Abstract | arXiv Analytics

arXiv:2410.11650 [cs.CV]Abstract References Reviews Resources

ED-ViT: Splitting Vision Transformer for Distributed Inference on Edge Devices

Xiang Liu, Yijun Song, Xia Li, Yifei Sun, Huiying Lan, Zemin Liu, Linshan Jiang, Jialin Li

Published 2024-10-15Version 1

Deep learning models are increasingly deployed on resource-constrained edge devices for real-time data analytics. In recent years, Vision Transformer models and their variants have demonstrated outstanding performance across various computer vision tasks. However, their high computational demands and inference latency pose significant challenges for model deployment on resource-constraint edge devices. To address this issue, we propose a novel Vision Transformer splitting framework, ED-ViT, designed to execute complex models across multiple edge devices efficiently. Specifically, we partition Vision Transformer models into several sub-models, where each sub-model is tailored to handle a specific subset of data classes. To further minimize computation overhead and inference latency, we introduce a class-wise pruning technique that reduces the size of each sub-model. We conduct extensive experiments on five datasets with three model structures, demonstrating that our approach significantly reduces inference latency on edge devices and achieves a model size reduction of up to 28.9 times and 34.1 times, respectively, while maintaining test accuracy comparable to the original Vision Transformer. Additionally, we compare ED-ViT with two state-of-the-art methods that deploy CNN and SNN models on edge devices, evaluating accuracy, inference time, and overall model size. Our comprehensive evaluation underscores the effectiveness of the proposed ED-ViT framework.

Comments: 14 pages, 8 figures

Categories: cs.CV, cs.AI

Keywords: edge devices, splitting vision transformer, vision transformer splitting framework, distributed inference, significantly reduces inference latency

Related articles: Most relevant | Search more

arXiv:2102.03456 [cs.CV] (Published 2021-02-06)

BinaryCoP: Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices

Nael Fasfous, Manoj-Rohit Vemparala, Alexander Frickenstein, Lukas Frickenstein, Walter Stechele

arXiv:2407.15067 [cs.CV] (Published 2024-07-21)

VoxDepth: Rectification of Depth Images on Edge Devices

Yashashwee Chakrabarty, Smruti Ranjan Sarangi

arXiv:2312.11716 [cs.CV] (Published 2023-12-18)

Squeezed Edge YOLO: Onboard Object Detection on Edge Devices