Home > Back-end >  array_filter and array_intersect performance improvements on large foreach loop
array_filter and array_intersect performance improvements on large foreach loop

Time:10-11

I'm attempting to compare arrays to get a new array of matched and unmatched items. I have a loop that goes through roughly 55,000 items. The processing of this script can take upwards of 20 minutes to attempt to complete and I've narrowed it down to both usage of array_intersect and array_filter within the foreach. Ideally, I need it to complete much faster. If I limit the foreach to 1000 items it still takes upwards of ~3 minutes to complete which is slow for the client-side experience.

If I remove them, the script completes almost immediately. As such, I will include only these relevant pieces in the code below.

I'm using a custom array_intersect_fixed function as regular array_intersect returned wrong results with duplicate values as per here.

Explanations:

totalidsarray = An array of numbers such as ['11233', '35353, '3432433', '123323']. Could contain thousands of items.

$deck['main_deck'] = An array of numbers to compare against $totalidsarray. Similar structure. Max length is 60 items.

foreach($dbdeckarray as $deck){

         $tmparray = $totalidsarray;

        //Get an array of unmatched cards between the deck and the collection
        //Collection = $tmparray
        $tmpmatch['unmatched'] = array_filter($deck['main_deck'],
            function ($val) use (&$tmparray) {
                $key = array_search($val, $tmparray);
                if ( $key === false ) return true;
                unset($tmparray[$key]);
                return false;
            }
        );   

        //Get an array of matched cards between the deck and the collection
        $tmpmatch['matched'] = array_intersect_fixed($deck['main_deck'], $totalidsarray);

        //Push results to matcharray
        $matcharray[] = $tmpmatch;

}

//Built in array_intersect function returns wrong result when input arrays have duplicate values.
function array_intersect_fixed($array1, $array2) {
    $result = array();
    foreach ($array1 as $val) {
        if (($key = array_search($val, $array2, FALSE))!==false) {
            $result[] = $val;
            unset($array2[$key]);
        }
    }
    return $result;
}

To make matters worse, I have to do 2 further matched/unmatched checks within that same foreach loop against another array extra_deck, further increasing processing time.

Is there a more optimized approach I can take to this?

EDIT: Explanation of what the code needs to achieve.

The script will retrieve the user's card collection of cards that they own from a card game. This is assigned into totalidsarray. It will then query every deck in the database (~55,000) and compare the collection you own against the built deck of cards (main_deck). It then attempted to extract all owned cards (matched) and all un-owned cards (unmatched) into two arrays. Once the full foreach loop is done, the client-side returns a list of each deck alongside the matched cards/unmatched cards of each (with a % match for each).

CodePudding user response:

A couple of optimizations I can suggest:

  • The array_intersect_fixed routine you have is quadratic in nature in terms of getting the result, because it is 2 nested loops under the hood. We can use array_count_values to optimize it to work in linear time(which uses a map).

  • json_decode() doesn't need to be done twice for every deck. If you do it once and use it wherever needed, it should work just fine(unless you make any edits in place which I don't find right now) . It also needs to be decoded to an array and not to an object using the true flag.

  • For your array_filter, the comparison is also quadratic in nature. We will use array_count_values again to optimize it and use a $matched array. We keep counting the frequency of elements and if any of them surpasses count in $tmparray, we return false, else, we return true.

Snippet:

<?php 

$tmparray = array_count_values($totalidsarray);

foreach($dbdeckarray as $deck){
        $matched = [];        
        $deck['main_deck'] = json_decode($deck['main_deck'], true);
        $tmpmatch['unmatched'] = array_filter($deck['main_deck'],
            function ($val) use (&$tmparray, &$matched) {
              $matched[ $val ] = $matched [$val] ?? 0;
              $matched[ $val ]  ;
              return $matched[ $val ] <= $tmparray[ $val ];
            }
        );   

        $tmpmatch['matched'] = array_intersect_fixed($deck['main_deck'], $tmparray);
        $matcharray[] = $tmpmatch;
}

function array_intersect_fixed($array1, $array2) {
    $result = array();
    $matched = [];
    foreach ($array1 as $val) {
        $matched[ $val ] = $matched[ $val ] ?? 0;
        $matched[ $val ]  ;
        if (isset($array2[ $val ]) && $matched[ $val ] <= $array2[ $val ]) {
            $result[] = $val;
        }
    }
    return $result;
}

Note: array_intersect_fixed expects $array2 to be in the Hashmap way by default. If you wish to use it elsewhere, make sure to pass array_count_values of the array as 2nd parameter or use a third parameter to indicate a flag check otherwise.

CodePudding user response:

Beside @nice_dev suggestion your code can be simplified.

The unmatched part is an array diff

array_diff($deck['main_deck'], $tmparray);

The array_intersect_fixed(), if the problem are duplicated value in array, can be avoided by running array_unique() on the array (I guess is $deck['main_deck']) before calling array_intersect()

This will also speed up array_diff() as it will have less array element to compare.

  • Related