Home > OS >  Find duplicates from array of objects in php
Find duplicates from array of objects in php

Time:11-10

I have a array of objects in PHP like:

[
{
    "id": 99961,
    "candidate": {
        "data": {
            "id": 125275,
            "firstName": "Jose",
            "lastName": "Zayas"           
        }
    },
    "dateAdded": 1667995574207
},
{
    "id": 99960,        
    "candidate": {
        "data": {
            "id": 125274,
            "firstName": "CHRISTIAN",
            "lastName": "NEILS"               
        }
    },
    "dateAdded": 1667986477133
},
{
    "id": 99959,
     "candidate": {
        "data": {
            "id": 125273,
            "firstName": "Jose",
            "lastName": "Zayas"           
        }
    },
    "dateAdded": 1667985600420
},
{
    "id": 99958,
     "candidate": {
        "data": {
            "id": 125275,
            "firstName": "Jose",
            "lastName": "Zayas"           
        }
    },
    "dateAdded": 1667985600420
},
]

I want to find duplicates based on same candidate firstName and lastName but different candidate id's. I have tried multiple methods like array_search, array_column, array_filter etc but nothing giving desired result. Output should be array of candidate.data.id that are different but with same firstName and lastName. Can anyone guide me in building the algorithm?

CodePudding user response:

You need to set iteration for to pick key data (firstName and lastName) for buffer. After it, you can check key from buffer data for to find exists users. :

<?php

$arr = '[
{
    "id": 99961,
    "candidate": {
        "data": {
            "id": 125275,
            "firstName": "Jose",
            "lastName": "Zayas"           
        }
    },
    "dateAdded": 1667995574207
},
{
    "id": 99960,        
    "candidate": {
        "data": {
            "id": 125274,
            "firstName": "CHRISTIAN",
            "lastName": "NEILS"               
        }
    },
    "dateAdded": 1667986477133
},
{
    "id": 99959,
     "candidate": {
        "data": {
            "id": 125273,
            "firstName": "Jose",
            "lastName": "Zayas"           
        }
    },
    "dateAdded": 1667985600420
}
]';
$candidatePersonality = $exits = [];
$json = json_decode($arr,true);

array_walk($json,function($item,$key) use(&$candidatePersonality,&$exits){

    $userKey = $item['candidate']['data']['firstName'].' '.$item['candidate']['data']['lastName'];

    if(!isset($candidatePersonality[$userKey])){
        $exits[] = $item;
    }
    
    $candidatePersonality[$userKey][] = $item['candidate']['data']['id'];

    
});
echo '<pre>';
print_r($exits);

To return IDs you can try this:

$arr = <<<JSON
[
{
    "id": 99961,
    "candidate": {
        "data": {
            "id": 125275,
            "firstName": "Jose",
            "lastName": "Zayas"           
        }
    },
    "dateAdded": 1667995574207
},
{
    "id": 99960,        
    "candidate": {
        "data": {
            "id": 125274,
            "firstName": "CHRISTIAN",
            "lastName": "NEILS"               
        }
    },
    "dateAdded": 1667986477133
},
{
    "id": 99959,
     "candidate": {
        "data": {
            "id": 125273,
            "firstName": "Jose",
            "lastName": "Zayas"           
        }
    },
    "dateAdded": 1667985600420
},
{
    "id": 99958,
     "candidate": {
        "data": {
            "id": 125275,
            "firstName": "Jose",
            "lastName": "Zayas"           
        }
    },
    "dateAdded": 1667985600420
}
]
JSON;

$candidatePersonality = $exits = [];
$json = json_decode($arr,true);

array_walk($json,function($item,$key) use(&$candidatePersonality,&$exits){

    $userKey = $item['candidate']['data']['firstName'].' '.$item['candidate']['data']['lastName'];
    $candidatePersonality[$userKey][] = $item['candidate']['data']['id'];

    if(count($candidatePersonality[$userKey])> 1){
        $exits[$userKey][] = $item['candidate']['data']['id'];
    }
    
});
echo '<pre>';
print_r($exits);

CodePudding user response:

Group on a key which is composed of the candidate's first name and last name and push the respective id as a child of that group using the id as the key and the value -- this ensures uniqueness within the group.

Then filter out the candidates that only occur once and re-index the qualifying subsets.

Code: (Demo)

$grouped = [];
foreach ($array as $row) {
    $compositeKey = "{$row['candidate']['data']['firstName']} {$row['candidate']['data']['lastName']}";
    $grouped[$compositeKey][$row['candidate']['data']['id']] = $row['candidate']['data']['id'];
}
$result = [];
foreach ($grouped as $id => $group) {
    if (count($group) > 1) {
        $result[$id] = array_values($group);
    }
}
var_export($result);

Output:

array (
  'Jose Zayas' => 
  array (
    0 => 125275,
    1 => 125273,
  ),
)

Or with one loop, collect all unique ids per group then push all keys into the result array if more than 1 total in the group. (Demo)

$duplicated = [];
foreach ($array as $row) {
    $compositeKey = "{$row['candidate']['data']['firstName']} {$row['candidate']['data']['lastName']}";
    $id = $row['candidate']['data']['id'];
    $grouped[$compositeKey][$id] = '';
    if (count($grouped[$compositeKey]) > 1) {
        $duplicated[$compositeKey] = array_keys($grouped[$compositeKey]);
    }
}
var_export($duplicated);
  • Related