Twitter chatter identifies fraudulent COVID-19 products early

In a recent study posted to the medRxiv* pre-print server, researchers evaluated methods that could automatically and quickly detect counterfeit coronavirus disease 2019 (COVID-19) prevention and treatment products using Twitter chats.

They employed natural language processing (NLP) and time-series anomaly detection methods based on the conviction that as any fraudulent product gains popularity among Twitter users, there is a corresponding increase in the volume of chats or mentions about the product. Intriguingly, these novel detection methods quickly detect sudden increases in the frequency of mentions on social media platforms, including Twitter and Facebook.

Study: Early detection of fraudulent COVID-19 products from Twitter chatter. Image Credit: Michele Ursi / Shutterstock


Amid actual efforts to mitigate the impact of the COVID-19 pandemic by public health agencies globally, the unscrupulous promotion of fraudulent products claiming to treat, prevent, or cure severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has been a persistent and annoying issue.

The United States Food and Drug Administration (FDA) issues warning letters to curb the spread of such products; however, only after many people have been exposed to them. However, in the US, such products cannot be sold or advertised on television or News. Therefore, entities selling such products promote them on social media platforms causing the spread of misinformation or an infodemic.

Therefore, there is an urgent need to devise vigilance tools that automatically identify potentially counterfeit COVID-19 products early and generate alerts. Fortunately, it is possible to automate real-time surveillance of fraudulent COVID-19 products on social media.

About the study

In the present study, researchers employed time-series anomaly detection methods to detect any or all abnormal increases in the mentions of COVID-19-related counterfeit products on Twitter. They systemically curated all the Twitter chats via NLP to generate alerts. The team used real-time data from the Twitter COVID-19 application-programming interface (API), directly provided by Twitter, to support COVID-19-related research. Subsequently, the team could gather 577,872,350 tweets that mentioned COVID-19 related keywords, including coronavirus, covid, etc., between February 19, 2020, and December 31, 2020. 

The researchers excluded keywords collected after 2020 and 12 keywords that were mentioned less than 10 times on Twitter, including their linguistic variants. They gathered data continuously and stored it in a database hosted on the Google Cloud Platform.

Next, the team manually curated a comprehensive list of counterfeit COVID-19 products from the US FDA website. Likewise, they listed person(s) names who owned these products, their websites, and social media profiles, if any. The researchers also reviewed 183 FDA warning letters manually to create a list of products and entities and their earliest FDA issuance letter dates.

Further, they used a data-centric tool to catch spelling variants or misspellings in the names of counterfeit COVID-19 products. The variant generation tool applied semantic and lexical similarity measures to automatically identify such errors, including key phrases and multi-word expressions.

The team analyzed all the products and keyphrase spelling variants with at least 10 mentions in the curated data. Then, they normalized daily counts by the total number of Twitter posts collected on the same day. The mentions per 1000 tweets depicted the daily relative frequencies of COVID-19-related keywords and phrases.

Lastly, any data point at a distance farther than three standard deviations (SDs) from the 14-day moving average was considered a potential signal. It helped researchers determine whether the date of the first signal for a COVID-19-related keyword was detected earlier than the FDA letter issuance date, within a week or later.

Study findings

The FDA warning letters were issued between March 6, 2020, and June 22, 2021. The authors identified 221 potential keywords associated with the counterfeit COVID-19 products or the entities selling them. Of the total, the researchers assessed only 56 keywords because they only considered the first mention of a keyword in their analysis for early detection.

In total, 44 key phrases related to COVID-19 met all the inclusion criteria, and 43 of the 44 key phrases showed abnormal increases in their mentions at some point. A staggering 77.3% of keywords (34/44) were detectable before the FDA letter issuance dates through Twitter chatter. An additional 13.6% of keywords anomalously increased within seven days of the FDA letter issuance dates.


According to the authors, the current study is the first to use social media-based surveillance for detecting COVID-19 counterfeit products early relative to the FDA warning issuance dates. Specifically, the researchers identified products that gained popularity via promotion on Twitter. The study approach was simple, unsupervised with no need for training data, and economical because it relied on publically available social media chatter.

*Important notice

medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Source: Early detection of fraudulent COVID-19 products from Twitter chatter, Abeed Sarker, Sahithi Lakamana, Ruqi Liao, Aamir Abbas, Yuan-Chi Yang, Mohammed Ali Al-Garadi, medRxiv pre-print 2022, DOI:,

Posted in: Device / Technology News | Medical Research News | Disease/Infection News

Tags: Coronavirus, Coronavirus Disease COVID-19, Food, Frequency, Language, Pandemic, Public Health, Research, Respiratory, SARS, SARS-CoV-2, Severe Acute Respiratory, Severe Acute Respiratory Syndrome, Syndrome

Comments (0)

Written by

Neha Mathur

Neha is a digital marketing professional based in Gurugram, India. She has a Master’s degree from the University of Rajasthan with a specialization in Biotechnology in 2008. She has experience in pre-clinical research as part of her research project in The Department of Toxicology at the prestigious Central Drug Research Institute (CDRI), Lucknow, India. She also holds a certification in C++ programming.

Source: Read Full Article