arXiv:2401.08825 Abstract | arXiv Analytics

arXiv:2401.08825 [cs.LG]Abstract References Reviews Resources

AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media

Published 2024-01-16Version 1

Online reviews in the form of user-generated content (UGC) significantly impact consumer decision-making. However, the pervasive issue of not only human fake content but also machine-generated content challenges UGC's reliability. Recent advances in Large Language Models (LLMs) may pave the way to fabricate indistinguishable fake generated content at a much lower cost. Leveraging OpenAI's GPT-4-Turbo and DALL-E-2 models, we craft AiGen-FoodReview, a multi-modal dataset of 20,144 restaurant review-image pairs divided into authentic and machine-generated. We explore unimodal and multimodal detection models, achieving 99.80% multimodal accuracy with FLAVA. We use attributes from readability and photographic theories to score reviews and images, respectively, demonstrating their utility as hand-crafted features in scalable and interpretable detection models, with comparable performance. The paper contributes by open-sourcing the dataset and releasing fake review detectors, recommending its use in unimodal and multimodal fake review detection tasks, and evaluating linguistic and visual features in synthetic versus authentic data.

Categories: cs.LG, cs.CL, cs.CV

Keywords: machine-generated restaurant reviews, social media, multimodal dataset, content challenges ugcs reliability, indistinguishable fake generated content

Related articles: Most relevant | Search more

arXiv:1903.11027 [cs.LG] (Published 2019-03-26)

nuScenes: A multimodal dataset for autonomous driving

Holger Caesar et al.

arXiv:2410.16204 [cs.LG] (Published 2024-10-21, updated 2024-12-09)

Systematic Review: Text Processing Algorithms in Machine Learning and Deep Learning for Mental Health Detection on Social Media

Yuchen Cao, Jianglai Dai, Zhongyan Wang, Yeyubei Zhang, Xiaorui Shen, Yunchong Liu, Yexin Tian

arXiv:1605.03481 [cs.LG] (Published 2016-05-11)

Tweet2Vec: Character-Based Distributed Representations for Social Media

Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, William W. Cohen

arXiv Analytics

arXiv:2401.08825 [cs.LG]Abstract References Reviews Resources

AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media

Links

Toolbox

arXiv:2401.08825 [cs.LG]AbstractReferencesReviewsResources

AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media

Links

Toolbox

arXiv:2401.08825 [cs.LG]Abstract References Reviews Resources