Home > Software design >  php Compare two text files and output NON matching records
php Compare two text files and output NON matching records

Time:03-25

I have two text files of data the first file has 30 lines of data and matches with 30 lines in the second text file, but in addition the first text file has two additional lines that are added as the operator uploads file to the directory I want to find the non matching lines and out put them to be used in the same script as a mailout.

I am trying to use this code, which outputs the contents of the two files to screen.

<?php


if ($file1 = fopen(".data1.txt", "r")) {
    while(!feof($file1)) { $textperline = fgets($file1);
         echo $textperline;
        echo "<br>";}

if ($file2 = fopen(".data.txt", "r")) {
    while(!feof($file2)) {$textperline1 = fgets($file2);
    echo $textperline1;
    echo "<br>";}

fclose($file1);
fclose($file2);
}}
 
?>

But it outputs the whole list of data, can anyone help listingout only NON matching lines?

attached output of the two files from my code

I want to output only lines that are in file2 but not in file1

CodePudding user response:

My suggestion would be to read each file into an array (one line = one element) and then use array_diff to compare them. Unless you have millions of lines, this approach is the easiest.

To reuse your code, this is how you can read the 2 files into two arrays

$list1 = [];
$list2 = [];
if ($file1 = fopen(".data1.txt", "r")) {
    while (!feof($file1)) {
        $list1[] = trim(fgets($file1));
    }
    fclose($file1);
}

if ($file2 = fopen(".data.txt", "r")) {
    while (!feof($file2)) {
        $list2[] = trim(fgets($file2));
    }
    fclose($file2);
}

If the files are small and you can read them in one go, you can also use a simplified syntax.

$list1 = explode(PHP_EOL, file_get_contents(".data1.txt"));
$list2 = explode(PHP_EOL, file_get_contents(".data.txt"));

Then, no matter which method you chose, you can compare them as follows

$comparison = array_diff($list2, $list1);
foreach ($comparison as $line) {
    echo $line."<br />";
}

This will only output the lines of the second array that are not present in the first one.

Make sure that the one with the additional lines is the first argument of array_diff

CodePudding user response:

ASSUMPTION

Both files are not huge and you can read the whole content into the memory at once. According to this, you can put following code to the top:

$file1 = "./data1.txt";
$file2 = "./data2.txt";
$linesOfFile1 = file($file1);
$linesOfFile2 = file($file2);
$newLinesInFile2 = [];

There are a couple cases, which you did not mention in your question.

CASE 1

New lines are only appended to the secode file file2. The solution for this case is the easiest one:

$numberOfRowsFile1 = count($linesOfFile1);
$numberOfRowsFile2 = count($linesOfFile1);
if($numberOfRowsFile2 > $numberOfRowsFile1)
{
    $newLinesInFile2 = array_slice($linesOfFile2, $numberOfRowsFile1);
}

CASE 2

The lines with the same content may have different position in each file. Duplicate lines within the same file are ignored.

Furthermore the case sensitivity may play a role. That's why the content of each line should be hashed to make a simpler comparison. For both case sensitive and insensitive comparison the following function is needed:

function buildHashedMap($array, &$hashedMap, $caseSensitive = true)
{
    foreach($array as $line)
    {
        $line = !$caseSensitive ? strtolower($line) : $line;
        $hash = md5($line);
        $hashedMap[$hash] = $line;
    }
}

Case sensitive comparison

$hashedLinesFile1 = [];
buildHashedMap($linesOfFile1, $hashedLinesFile1);

$hashedLinesFile2 = [];
buildHashedMap($linesOfFile2, $hashedLinesFile2);

$newLinesInFile2 = array_diff_key($hashedLinesFile2, $hashedLinesFile1);

Case INSENSITIVE comparison

$caseSensitive = false;
$hashedLinesFile1 = [];
buildHashedMap($linesOfFile1, $hashedLinesFile1, $caseSensitive);

$hashedLinesFile2 = [];
buildHashedMap($linesOfFile2, $hashedLinesFile2, $caseSensitive);

$newLinesInFile2 = array_diff_key($hashedLinesFile2, $hashedLinesFile1);
  • Related