Home > Software engineering >  How to get intersection between two strings in PHP
How to get intersection between two strings in PHP

Time:11-13

A problem description:

I have two strings and I need to find the length of intersection of them.

Let's assume the both strings are Latin-ASCII and lower case.

These are expected results:

$str1 = "lorem ipsum";
$str2 = "rem";
echo str_intersection($str1, $str2); // Expected result: 3

$str2 = "xzy";
echo str_intersection($str1, $str2); // Expected result: 0

My try to solve the problem:

I've tried to compare the strings using array_intersect() function this way:

$str_intersection = function(string $str1, string $str2): int {
   $arr1 = str_split($str1); // ['l','o','r','e','m',' ','i','p','s','u','m']
   $arr2 = str_split($str2); // ['r','e','m']

   return count(array_intersect($arr1, $arr2));
};

echo $str_intersection($str1, $str2); // Result: 4 (because of lo*REM* ipsu*M*)

But this way of comparing two strings is inappropriate because it compares occurrences of characters and not whole parts of strings as I need it.

In addition, the str_intersection() function designed in this way is not only inappropriate, but also very slow if I need to compare thousands of strings.


Example how I plan to use the needed function:

As requested I wrote a little example how I plan to use the string intersection function:

$strings = ['lorem', 'ipsum', 'dolor', 'sit', 'amet', 'consectetur'];
$needle = 'lo';
$intersections = [];
foreach ($strings as $str) {
    $intersections[] = str_intersection($str, $needle);
}
print_r($intersections);

Expected result (intersection "highlighed" as uppercase):

Array (
    [0] => 1 // LOrem
    [1] => 0 // ipsum
    [2] => 1 // doLOr
    [3] => 0 // sit
    [4] => 0 // amet
    [5] => 0 // consectetur
)

CodePudding user response:

Compare for the first matching character and then compare until end. For the case you want it case insensitive, I used strtolower().

$countIntersections = function (string $source, string $snippet): int {
    $a = strtolower($source);
    $b = strtolower($snippet);

    $index = 0;
    $lengths = [];
    while ($index < strlen($a)) {
        $pos = strpos($a, $b[0], $index);
        if (false === $pos) break;
        $max = strlen($b);
        while ($max) {
            if (substr($a, $pos, $max) === substr($b, 0, $max)) {
                $lengths[] = $max;
                break;
            }
            $max--;
        }
        $index = $pos   1;
    }

    return max([0, ...$lengths]);
};

var_dump($countIntersections('Lorem ipsum', 'rem'));
var_dump($countIntersections('Lorem ipsum', 'um'));
var_dump($countIntersections('Lorem ipsum', 'abc'));

Output

int(3)
int(2)
int(0)

CodePudding user response:

This is my attempt.

function str_intersection($str1, $str2)
{
   [$long, $short] = strlen($str1) > strlen($str2) ? [$str1, $str2] : [$str2, $str1];
   $shortLength = strlen($short);
   for ($length = $shortLength; $length > 0; $length--) {
       for ($offset = 0; $offset < $shortLength - 1; $offset  ) {
           if (strpos($long, substr($short, $offset, $length)) !== false) return $length;
       }       
   }
   return 0;    
}

$str1 = "lorem ipsum";
$str2 = "rem";
echo str_intersection($str1, $str2) . PHP_EOL; // Expected result: 3

$str2 = "xzy";
echo str_intersection($str1, $str2) . PHP_EOL; // Expected result: 0

This outputs:

3
0

See: https://3v4l.org/7YW0R#v8.0.25

This function starts by sorting the input strings, so we know which one is the shortest. It then tries to find the longest part of this shortest string in the longer string. This is not very efficient, who can improve this?

CodePudding user response:

You can use the strpos function

https://www.php.net/manual/en/function.strpos.php

  • Related