Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Leveraging Paraphrase Labels to Extract Synonyms from Twitter

Conference ·
OSTI ID:1222097

We present an approach for automatically learning synonyms from a paraphrase corpus of tweets. This work shows improvement on the task of paraphrase detection when we substitute our extracted synonyms into the training set. The synonyms are learned by using chunks from a shallow parse to create candidate synonyms and their context windows, and the synonyms are incorporated into a paraphrase detection system that uses machine translation metrics as features for a classifier. We demonstrate a 2.29% improvement in F1 when we train and test on the paraphrase training set, providing better coverage than previous systems, which shows the potential power of synonyms that are representative of a specific topic.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1222097
Report Number(s):
PNNL-SA-106823; 400470000
Country of Publication:
United States
Language:
English

Similar Records

Semantic role labeling for protein transport predicates
Journal Article · Wed Jun 11 00:00:00 EDT 2008 · BMC Bioinformatics · OSTI ID:1626355

Automatic Labeling for Entity Extraction in Cyber Security
Conference · Tue Dec 31 23:00:00 EST 2013 · OSTI ID:1143555

Using Topic Modeling and Text Embeddings to Predict Deleted Tweets
Conference · Sun Feb 28 23:00:00 EST 2016 · OSTI ID:1334882

Related Subjects