Home > OS >  Find duplicate PARTS of a string in PHP Array
Find duplicate PARTS of a string in PHP Array

Time:08-15

I have searched around not to ask a duplicate question but could not find a question that matches what I need.

I would like to know IF it is possible to find PARTS of a string (in an array ), and if yes, how to or what is the best way to approach this?

I already know we can find/count duplicates in PHP array like this for example:

$someArray = array('one','two','two');

$count = array_count_values($someArray); print_r($count);

which will return: Array ( [one] => 1 [two] => 2 )

But what if I want parts (instead of single values/words) of a string inside a php array te be found or counted?

Example 1: $stringArray = array('I am', 'I am happy', 'I am happy today');

In above example, I would like to count/get the duplicate parts in those strings which is "I am" with a count of 3 and "I am happy" with a count of 2.

With array_count_values() I could get the duplicate words/values by exploding the strings with a space as needle, storing the words in an array and count duplicates, but what if I want to get the "I am" or " I am happy" part of the duplicate in the strings in the array?

Example 2: $secondArray = array('today is a day', 'today is a beautiful day');

In this second example, how would one count the duplicates which is "today is a" and "today is a day"?

This example is different in the sense that it has duplicate words in different parts of the string.


EDIT (thanks @mickmackusa): I will try to be more specific:

  • The array I have holds around/max 1000 string values.

  • All of the values/strings inside the array are already in lower-case

  • Goal: find duplicate parts of the strings inside the array

  • Finding "the flowers" as duplicate in this example array with 3 values and where "the flowers" have a fixed position in all of 3 strings (which is the beginning of the string): $arr = array('the flowers','the flowers in the field','the flowers in the field look great');

Consider these 3 values to be news titles which I have fetched and put inside $arr array and now I wanted to extract/find the duplicate part in them, which is "the flowers" and/or "the flowers in the field".

If my goal was to find the most used / hottest keyword in these news titles, I could easily find "flowers" for example to be (one of the) the most used word in these titles by exploding the 3 strings and storing individual words in an array, and counting the duplicates in all of them with array_count_values().

Now I want "The flowers" or "The flowers in the field" (a string, not a single word/value) from the same array to finally get the similarity between the strings.

Sure, I could explode the strings here also with a space as needle, and count individual words and get "the" with a count of 3 and "flowers" with a count of 3,(among others), and concatenate them to get "the flowers" but this will obviously give inconsistent and unwanted results since the position of the words in the string are not known if I use explode() on the strings, and most certainly cannot reproduce "the flowers" as result if the array was arranged like this:

$arr = array('the flowers','here are the flowers that look great','the flowers in the field look great');

Here, the human eye can notice that "the flowers" is a duplicate in all 3 strings inside the array, just in different positions inside strings, but how can one achieve this(or with similar outcome) with PHP?

I am also more than satisfied with a solution where the position of the duplicate part of the string is fixed, at the beginning of the string for example.

CodePudding user response:

Here is a proposal for three words. You can choose their position and modify their number by appending more parts in the definition of $newArray:

<?php
$stringArray = array('I am', 'I am happy', 'I am happy today');
$secondArray = array('today is a day', 'today is a beautiful day');

findDuplicates($stringArray, 0, 1 ,2);
findDuplicates($secondArray, 0, 1 ,2);

function findDuplicates($array, $x, $y, $z)
{
$newArray = array();
for ($i = 0; $i < count($array); $i  ) 
{
$parts = explode(" ", $array[$i]);
$newArray[] = $parts[$x]." ".$parts[$y]." ".$parts[$z];
}
print "<br>";
$count = array_count_values($newArray); 
print_r($count);
}
?>

CodePudding user response:

You will have to loop through the entire array. Then run preg_match() on every item in the loop against your target match. See below example:

$target = "I am";
$new_array = [];
for ($i = 0; $i < count($stringArray); $i  ) {

$current = $stringArray[$i];

if (preg_match("#".$target."#", $current)) {
    $new_array[] = $current;
}
}

$stringArray = $new_array;
  • Related