Knowledge Graphs as Context Models: Improving the Detection of Cross-Language Plagiarism with Paraphrasing

Cross-language plagiarism detection attempts to identify and extract automatically plagiarism among documents in different languages. Plagiarized fragments can be translated verbatim copies or may alter their structure to hide the copying, which is known

PDF / 349,818 Bytes
10 Pages / 439.363 x 666.131 pts Page_size
25 Downloads / 191 Views

DOWNLOAD

REPORT

Natural Language Engineering Lab - ELiRF, DSIC Universitat Polit`ecnica de Val`encia, Valencia, Spain {mfranco,pgupta,prosso}@dsic.upv.es 2 Linguistic Computing Laboratory (LCL) Sapienza Universit`a di Roma, Roma, Italy [email protected]

Abstract. Cross-language plagiarism detection attempts to identify and extract automatically plagiarism among documents in different languages. Plagiarized fragments can be translated verbatim copies or may alter their structure to hide the copying, which is known as paraphrasing and is more difficult to detect. In order to improve the paraphrasing detection, we use a knowledge graph-based approach to obtain and compare context models of document fragments in different languages. Experimental results in German-English and Spanish-English crosslanguage plagiarism detection indicate that our knowledge graph-based approach offers a better performance compared to other state-of-the-art models. Keywords: Cross-language plagiarism detection, textual similarity, paraphrasing, knowledge graphs, BabelNet.

1 Introduction One of the biggest problems in literature and science is plagiarism: unauthorized use of the original content. Plagiarism is very difficult to detect, especially when the web is the source of information due to its size. The detection of plagiarism is even more difficult when it concerns documents written in different languages. Recently a survey was done on scholar practices and attitudes [2], also from a cross-language (CL) plagiarism perspective which manifests that CL plagiarism is a real problem: only 36.25% of students think that translating a text fragment and including it into their report is plagiarism. Plagiarized fragments can be translated verbatim copies, or can be hidden by their authors altering its structure, which is known as paraphrasing. In the recent study on paraphrasing in plagiarism [1] it has been shown that paraphrase mechanisms make

The research has been carried out in the framework of the European Commission WIQ-EI IRSES (no. 269180) and DIANA-APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) projects as well as the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems. We thank Roberto Navigli for offering help to get familiar with the BabelNet API.

N. Ferro (Ed.): PROMISE Winter School 2013, LNCS 8173, pp. 227–236, 2014. c Springer-Verlag Berlin Heidelberg 2014

228

M. Franco-Salvador, P. Gupta, and P. Rosso

plagiarism detection more difficult. Moreover, this study also shows that lexical substitutions are the paraphrase mechanisms most used in plagiarism, shortening the plagiarized text. This may be used in future to develop more effective plagiarism detectors. In recent years there have been a few approaches to CL similarity analysis that can be used for CL plagiarism detection. A simple, yet effective approach is the crosslanguage character n-gram (CL-CNG) model [9] which is based on the syntax of documents, which uses character n-grams, and offers remarkable perform

Data Loading...

Knowledge Graphs as Context Models: Improving the Detection of Cross-Language Plagiarism with Paraphrasing

Recommend Documents

Graphs as Structural Models The Application of Graphs and Multigraph

Random Graphs as Null Models

Training NER Models: Knowledge Graphs in the Loop

Research on MLChecker Plagiarism Detection System

Shifting the Norm: The Case of Academic Plagiarism Detection

Plagiarism

Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge Graphs

Automatic plagiarism detection in obfuscated text

The Knowledge Context of the Sustainability Discourse

Knowledge Graphs: Research Directions

Improving the Fusion of Outbreak Detection Methods with Supervised Learning

Models of Domination in Graphs