Home > Enterprise >  Remove duplicate from txt file that contains different sentences but consist of the same words on PH
Remove duplicate from txt file that contains different sentences but consist of the same words on PH

Time:04-03

I want to remove duplicates from txt file. Now, I use this to remove duplicates:

$lines = file('input.txt');
$lines = array_unique($lines);
file_put_contents('output.txt', implode($lines));

The problem is that code only remove duplicate for a case like beef bbq recipe and beef bbq recipe only. In my case, if the txt file contains keywords like :

beef bbq recipe
beef easy recipe
beef steak recipe
bbq recipe beef
beef bbq recipe
recipe bbq beef

Will return with this result :

beef bbq recipe
beef easy recipe
beef steak recipe
bbq recipe beef
recipe bbq beef

Instead, I want the result looks like this :

beef bbq recipe
beef easy recipe
beef steak recipe

So, I want cases like beef bbq recipe, bbq recipe beef and recipe bbq beef to be considered as duplicates too. Is there a solution for this? Thank you

CodePudding user response:

You can use array_map, explode and sort to bring the keywords into the same order for all your lines before removing duplicates:

$lines = file('input.txt');

// sort keywords in each line
$lines = array_map(function($line) {
    $keywords = explode(" ", trim($line));
    sort($keywords);
    return implode(" ", $keywords);
}, $lines);

$lines = array_unique($lines);
file_put_contents('output.txt', implode("\n", $lines));

This will iterate your array and order the keywords for each line alphabetically. Afterwards, you can remove the duplicated lines using array_unique.

  • Related