Leveraging Paraphrase Labels to Extract Synonyms from Twitter
We present an approach for automatically learning synonyms from a paraphrase corpus of tweets. This work shows improvement on the task of paraphrase detection when we substitute our extracted synonyms into the training set. The synonyms are learned by using chunks from a shallow parse to create candidate synonyms and their context windows, and the synonyms are incorporated into a paraphrase detection system that uses machine translation metrics as features for a classifier. We demonstrate a 2.29% improvement in F1 when we train and test on the paraphrase training set, providing better coverage than previous systems, which shows the potential power of synonyms that are representative of a specific topic.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1222097
- Report Number(s):
- PNNL-SA-106823; 400470000
- Country of Publication:
- United States
- Language:
- English
Similar Records
Automatic Labeling for Entity Extraction in Cyber Security
Using Topic Modeling and Text Embeddings to Predict Deleted Tweets