Home > Back-end >  check similarity/plagiarism between articles in mysql via python
check similarity/plagiarism between articles in mysql via python

Time:12-06

i have a mysql database where i add news articles, and before adding to it it try to compere that article with 100 last articles if it has any similarity.

so if is 95% similar i can tag it as same as article 122 or if it is 70-95% similar i can tag it as similar to article 133,

Which is best way to do this:

  1. is there a way or a function that mysql can do it

  2. do i need to use python to compare that article in a while loop with other 100 articles

as i read in forums python is the best way, but i tried some library to compare string1(article1) with string2(article2) and even if its totally different article it tell me it is 70% same

i think it is because of some same words like : and , he ,she, will, news,text,or,and, the, i

CodePudding user response:

If you are using Linux you can call from python the diff command and play with the parameters, a teacher a few years ago did this to detect copy in a programing exam, it worked even after reformatting the code

CodePudding user response:

as i read in forums python is the best way, but i tried some library to compare string1(article1) with string2(article2) and even if its totally different article it tell me it is 70% same

i think it is because of some same words like : and , he ,she, will, news,text,or,and, the, i

I will suggest to remove stopwords, might help.

SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;

default mysql STOPWORDS. For Information look at the MYSQL Full-Text Stopwords Documentation & Fine-Tuning MySQL Full-Text Search.

  • Related