{ "id": "2203.02505", "version": "v1", "published": "2022-03-03T06:19:51.000Z", "updated": "2022-03-03T06:19:51.000Z", "title": "ARM 4-BIT PQ: SIMD-based Acceleration for Approximate Nearest Neighbor Search on ARM", "authors": [ "Yusuke Matsui", "Yoshiki Imaizumi", "Naoya Miyamoto", "Naoki Yoshifuji" ], "comment": "ICASSP 2022", "categories": [ "cs.LG", "cs.CV", "cs.IR" ], "abstract": "We accelerate the 4-bit product quantization (PQ) on the ARM architecture. Notably, the drastic performance of the conventional 4-bit PQ strongly relies on x64-specific SIMD register, such as AVX2; hence, we cannot yet achieve such good performance on ARM. To fill this gap, we first bundle two 128-bit registers as one 256-bit component. We then apply shuffle operations for each using the ARM-specific NEON instruction. By making this simple but critical modification, we achieve a dramatic speedup for the 4-bit PQ on an ARM architecture. Experiments show that the proposed method consistently achieves a 10x improvement over the naive PQ with the same accuracy.", "revisions": [ { "version": "v1", "updated": "2022-03-03T06:19:51.000Z" } ], "analyses": { "keywords": [ "approximate nearest neighbor search", "simd-based acceleration", "arm architecture", "x64-specific simd register", "arm-specific neon instruction" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }