skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Leveraging Paraphrase Labels to Extract Synonyms from Twitter

Conference ·
OSTI ID:1222097

We present an approach for automatically learning synonyms from a paraphrase corpus of tweets. This work shows improvement on the task of paraphrase detection when we substitute our extracted synonyms into the training set. The synonyms are learned by using chunks from a shallow parse to create candidate synonyms and their context windows, and the synonyms are incorporated into a paraphrase detection system that uses machine translation metrics as features for a classifier. We demonstrate a 2.29% improvement in F1 when we train and test on the paraphrase training set, providing better coverage than previous systems, which shows the potential power of synonyms that are representative of a specific topic.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1222097
Report Number(s):
PNNL-SA-106823; 400470000
Resource Relation:
Conference: Proceedings of the 28th International Florida Artificial Intelligence Research Society Conference (FLAIRS-28), May 18-20, 2015, Hollywood, Florida, 3-7
Country of Publication:
United States
Language:
English

Similar Records

Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study
Journal Article · Tue May 25 00:00:00 EDT 2021 · Journal of Medical Internet Research · OSTI ID:1222097

“Thought I’d Share First” and Other Conspiracy Theory Tweets from the COVID-19 Infodemic: Exploratory Study
Journal Article · Wed Apr 14 00:00:00 EDT 2021 · JMIR Public Health and Surveillance · OSTI ID:1222097

Semantic role labeling for protein transport predicates
Journal Article · Wed Jun 11 00:00:00 EDT 2008 · BMC Bioinformatics · OSTI ID:1222097

Related Subjects