arXiv:2102.09507 Abstract | arXiv Analytics

arXiv:2102.09507 [cs.CL]Abstract References Reviews Resources

Regular Expressions for Fast-response COVID-19 Text Classification

Igor L. Markov, Jacqueline Liu, Adam Vagner

Published 2021-02-18Version 1

Text classifiers are at the core of many NLP applications and use a variety of algorithmic approaches and software. This paper describes how Facebook determines if a given piece of text - anything from a hashtag to a post - belongs to a narrow topic such as COVID-19. To fully define a topic and evaluate classifier performance we employ human-guided iterations of keyword discovery, but do not require labeled data. For COVID-19, we build two sets of regular expressions: (1) for 66 languages, with 99% precision and recall >50%, (2) for the 11 most common languages, with precision >90% and recall >90%. Regular expressions enable low-latency queries from multiple platforms. \hush{PHP, Python, Java and SQL code} Response to challenges like COVID-19 is fast and so are revisions. Comparisons to a DNN classifier show explainable results, higher precision and recall, and less overfitting. Our learnings can be applied to other narrow-topic classifiers.

Comments: 10 pages, 7 tables

Categories: cs.CL, cs.LG, cs.SI

Keywords: text classification, fast-response, regular expressions enable low-latency queries, evaluate classifier performance, higher precision

Related articles: Most relevant | Search more

arXiv:1912.00544 [cs.CL] (Published 2019-12-02)

Multi-Scale Self-Attention for Text Classification

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang

arXiv:2006.15315 [cs.CL] (Published 2020-06-27)

Uncertainty-aware Self-training for Text Classification with Few Labels

Subhabrata Mukherjee, Ahmed Hassan Awadallah

arXiv:2006.16174 [cs.CL] (Published 2020-06-29)

Multichannel CNN with Attention for Text Classification