Paraphrase detection using LSTM networks and handcrafted features

PDF / 697,618 Bytes
14 Pages / 439.642 x 666.49 pts Page_size
37 Downloads / 426 Views

Paraphrase detection using LSTM networks and handcrafted features Hassan Shahmohammadi1 · MirHossein Dezfoulian1 · Muharram Mansoorizadeh1 Received: 14 March 2020 / Revised: 23 August 2020 / Accepted: 29 September 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Paraphrase detection is one of the fundamental tasks in the area of natural language processing. Paraphrase refers to those sentences or phrases that convey the same meaning but use different wording. It has a lot of applications such as machine translation, text summarization, QA systems, and plagiarism detection. In this research, we propose a new deep-learning based model which can generalize well despite the lack of training data for deep models. After preprocessing, our model can be divided into two separate modules. In the first one, we train a single Bi-LSTM neural network to encode the whole input by leveraging its pretrained GloVe word vectors. In the second module, three sets of handcrafted features are used to measure the similarity between each pair of sentences, some of which are introduced in this research for the first time. Our final model is formed by incorporating the handcrafted features with the output of the Bi-LSTM network. Evaluation results on MSRP and Quora datasets show that it outperforms almost all the previous works in terms of f-measure and accuracy on MSRP and achieves comparable results on Quora. On the Quora-question pair competition launched by Kaggle, our model ranked among the top 24% solutions between more than 3000 teams. Keywords Paraphrase detection · Short text similarity · Deep learning · Feature engineering · Information fusion

1 Introduction With the ever increasing textual data on social media platforms such as Twitter and Facebook, measuring the semantic similarity of short texts is becoming more important, and Muharram Mansoorizadeh

[email protected] Hassan Shahmohammadi [email protected] MirHossein Dezfoulian [email protected] 1

Bu-Ali Sina University, Hamedan, Iran

Multimedia Tools and Applications

hence, related NLP tasks have been gaining a lot of attention. One of such tasks is paraphrase detection which tries to measure the semantic equivalence of two pieces of text. It is a critical task in many NLP applications such as machine translation, text summarization, QA systems, and plagiarism detection. In this research, we propose a new model that achieves a decent performance, despite the lack of sufficient training data for deep-learning based models. Our model can be divided into three parts. Preprocessing step is the first part that prepares the sentences for the next step. In the second part, terms are mapped to their numerical representations using GloVe word embedding [31]. The output of the embedding layer is then fed into a Bi-LSTM neural network [16] to encode the whole sentence by leveraging its word vectors. In the third part, three sets of fine-grained handcrafted features are provided to measure the similarity between each pair of s

Data Loading...

Paraphrase detection using LSTM networks and handcrafted features

Recommend Documents

Tamil Paraphrase Detection Using Encoder-Decoder Neural Networks

Detection of Metamorphic Malware Packers Using Multilayered LSTM Networks

Malicious Bot Detection in Online Social Networks: Arming Handcrafted Features with Deep Learning

Structural Health Monitoring Using Handcrafted Features and Convolution Neural Network

Anomaly Detection Using Bidirectional LSTM

Handcrafted Outlier Detection Revisited

UIC Code Recognition Using Computer Vision and LSTM Networks

Deep learning and handcrafted features for one-class anomaly detection in UAV video

Video Based Fire Detection Using Xception and Conv-LSTM

Model-Based Error Detection for Industrial Automation Systems Using LSTM Networks

Multilinear subspace learning using handcrafted and deep features for face kinship verification in the wild

Visual Based Drowsiness Detection Using Facial Features