Home > Software design >  Merging 2 json files where not all data matches up
Merging 2 json files where not all data matches up

Time:09-23

So I have 2 json files that I need to merge together, but the situation is somewhat unique.

So lets call the first movies.json:

[
    {
        "title": "Title of Movie 1",
        "description": "description of Movie 1",
        "link": "CDN_url_to_movie1",
        "filters": "list, of, filters"
    }
    {
        "title": "Title of Movie 2",
        "description": "description of Movie 2",
        "link": "CDN_url_to_movie2",
        "filters": "list, of, filters"
    }
]

And lets call the second movies2.json

[
    {
        "title": "Title of Movie 1",
        "description": "description of Movie 1",
        "link": "CDN_url_to_movie1"
    }
        {
        "title": "Title of Movie 2",
        "description": "description of Movie 2",
        "link": "CDN_url_to_movie2",
        "filters": "list, of, filters"
    }
    {
        "title": "Title of Movie 3",
        "description": "description of Movie 3",
        "link": "CDN_url_to_movie3"
    }
]

I need to merge these two files in such a way that there are no duplicates while considering filters may not exist in one file or the other.

Thus my desired output from the 2 examples would look like

[
    {
        "title": "Title of Movie 1",
        "description": "description of Movie 1",
        "link": "CDN_url_to_movie1",
        "filters": "list, of, filters"
    }
    {
        "title": "Title of Movie 2",
        "description": "description of Movie 2",
        "link": "CDN_url_to_movie2",
        "filters": "list, of, filters"
    }
    {
        "title": "Title of Movie 3",
        "description": "description of Movie 3",
        "link": "CDN_url_to_movie3"
    }
]

What I currently have looks like the following

<?php
    $arr1 = file_get_contents('movies.json');
    $arr2 = json_decode($arr1, true);

    $arr3 = file_get_contents('movies2.json');
    $arr4 = json_decode($arr3, true);

    $arr5 = array_unique(array_merge($arr2, $arr4), SORT_REGULAR);

    $arr = json_encode($arr5, JSON_PRETTY_PRINT);
    file_put_contents('movies3.json', $arr);

The result of this is:

[
    {
        "title": "Title of Movie 1",
        "description": "description of Movie 1",
        "link": "CDN_url_to_movie1",
        "filters": "list, of, filters"
    }
    {
        "title": "Title of Movie 2",
        "description": "description of Movie 2",
        "link": "CDN_url_to_movie2",
        "filters": "list, of, filters"
    }
    {
        "title": "Title of Movie 1",
        "description": "description of Movie 1",
        "link": "CDN_url_to_movie1"
    }
    {
        "title": "Title of Movie 3",
        "description": "description of Movie 3",
        "link": "CDN_url_to_movie3"
    }
]

As we can see the result is not desired. Although it removed the duplicate "movie 2" it considered each "movie 1" unique... I assume because one has the "filters" key and the other does not.

How can I merge these two files in such a way that I will get desired output?

CodePudding user response:

We merge in the loop we need to loop around to reach each array and merge it with it's parallel other - I have changed the json a bit to illustrate the merge better:

<?php

$movies1 = '[
    {
        "title": "Title of Movie 1",
        "description": "description of Movie 1",
        "link": "CDN_url_to_movie1",
        "filters": "list, of, filters"
    },
    {
        "title": "Title of Movie 2",
        "description": "description of Movie 2",
        "link": "CDN_url_to_movie2",
        "filters": "list, of, filters"
    },
    {
        "title": "Title of Movie 3",
        "link": "CDN_url_to_movie2",
        "filters": "list, of, filters"
    }
]';

$movies2 =  '[
    {
        "title": "Title of Movie 1",
        "description": "description of Movie 1",
        "link": "CDN_url_to_movie1"
    },
        {
        "title": "Title of Movie 2",
        "description": "description of Movie 2",
        "link": "CDN_url_to_movie2",
        "filters": "list, of, filters"
    },
    {
        "title": "Title of Movie 3",
        "description": "description of Movie 3",
        "link": "CDN_url_to_movie3"
    }
]';


$movies1A = json_decode($movies1,true);
$movies2A = json_decode($movies2,true);
echo '<pre>';
print_r($movies1A);
echo '<pre>';
print_r($movies2A);
$newM = [];
foreach ($movies1A as $key => $m1){

    foreach($movies2A as $ky => $m2){
            $newM = array_merge($m1,$m2);
            $movies2A[$key] = $newM;
    }
}
echo '<pre>';
print_r($movies2A);

Will return:

Array
(
    [0] => Array
        (
            [title] => Title of Movie 1
            [description] => description of Movie 1
            [link] => CDN_url_to_movie1
            [filters] => list, of, filters
        )

    [1] => Array
        (
            [title] => Title of Movie 2
            [description] => description of Movie 2
            [link] => CDN_url_to_movie2
            [filters] => list, of, filters
        )

    [2] => Array
        (
            [title] => Title of Movie 3
            [link] => CDN_url_to_movie2
            [filters] => list, of, filters
        )

)
Array
(
    [0] => Array
        (
            [title] => Title of Movie 1
            [description] => description of Movie 1
            [link] => CDN_url_to_movie1
        )

    [1] => Array
        (
            [title] => Title of Movie 2
            [description] => description of Movie 2
            [link] => CDN_url_to_movie2
            [filters] => list, of, filters
        )

    [2] => Array
        (
            [title] => Title of Movie 3
            [description] => description of Movie 3
            [link] => CDN_url_to_movie3
        )

)
Array
(
    [0] => Array
        (
            [title] => Title of Movie 3
            [description] => description of Movie 3
            [link] => CDN_url_to_movie3
            [filters] => list, of, filters
        )

    [1] => Array
        (
            [title] => Title of Movie 3
            [description] => description of Movie 3
            [link] => CDN_url_to_movie3
            [filters] => list, of, filters
        )

    [2] => Array
        (
            [title] => Title of Movie 3
            [link] => CDN_url_to_movie3
            [filters] => list, of, filters
            [description] => description of Movie 3
        )

)

Notice I have added on purpose title 3 in both with a difference to show how it merges! :)

just $movies2A = json_encode($movies2A); at the end and you have what you want.

CodePudding user response:

If the title attribute is unique for each array, you can use the array_column function to make it the key of the associative array, and then merge it through the array_replace_recursive function.

$arr1 = json_decode($json1, true);
$arr2 = json_decode($json2, true);

$result = array_values(array_replace_recursive(
    array_column($arr1, null, 'title'),
    array_column($arr2, null, 'title')
));    

fiddle

CodePudding user response:

This is actually a bit harder to do than it, at first, seems. I've written some very explicit code, where it is easy to understand what it does, and that this is a correct solution.

Basically I detect all duplicates, keep only the correct ones in another array, while deleting all of them from the original arrays. Then I merge the original arrays and stored correct duplicates.

I do not sort the result. And this will only work if the original arrays do not contain duplicates themselves.

<?php

// it is obvious what this starts with, so I left that out
$movies1 = json_decode($moviesJson1);
$movies2 = json_decode($moviesJson2);

// first we find the duplicated movies, and choose the one with filters
$duplicates = [];
$removeKeys1 = [];
$removeKeys2 = [];
foreach ($movies1 as $key1 => $movie1) {
    foreach ($movies2 as $key2 => $movie2) {
        if ($movie1->title == $movie2->title) {
            $duplicates[] = property_exists($movie1, "filters") ? $movie1 : $movie2;
            $removeKeys1[] = $key1;
            $removeKeys2[] = $key2;
        }
    }    
}

// then we remove all duplicated movies from the original arrays
foreach ($removeKeys1 as $key) {
    unset($movies1[$key]);
}
foreach ($removeKeys2 as $key) {
    unset($movies2[$key]);
}

// finally we merge everything that's left
$movies = array_merge($movies1, $movies2, $duplicates);
$moviesJson = json_encode($movies, JSON_PRETTY_PRINT);

echo $moviesJson;

This returns:

[
    {
        "title": "Title of Movie 3",
        "description": "description of Movie 3",
        "link": "CDN_url_to_movie3"
    },
    {
        "title": "Title of Movie 1",
        "description": "description of Movie 1",
        "link": "CDN_url_to_movie1",
        "filters": "list, of, filters"
    },
    {
        "title": "Title of Movie 2",
        "description": "description of Movie 2",
        "link": "CDN_url_to_movie2",
        "filters": "list, of, filters"
    }
]

Here is the working demo code

As said, this code was not written to present the most clever solution, but just code that actually works. However, because I can, I added a version without the key arrays:

$movies1 = json_decode($moviesJson1);
$movies2 = json_decode($moviesJson2);

// first we find the duplicated movies, and choose the one with filters
$duplicates = [];
foreach ($movies1 as $key1 => $movie1) {
    foreach ($movies2 as $key2 => $movie2) {
        if ($movie1->title == $movie2->title) {
            $duplicates[] = property_exists($movie1, "filters") ? $movie1 : $movie2;
        }
    }    
}

// then we remove all duplicated movies from the original arrays by title
$duploTitles = array_column($duplicates, "title");
foreach (["movies1", "movies2"] as $arrayName) {
    foreach (array_column(${$arrayName}, "title") as $key => $title) {
        if (in_array($title, $duploTitles)) {
            unset(${$arrayName}[$key]);
        }    
    }
}

// finally we merge everything that's left
$movies = array_merge($movies1, $movies2, $duplicates);
$moviesJson = json_encode($movies, JSON_PRETTY_PRINT);

Here is the working demo code

This does exactly the same thing. You could call this code a bit smarter, but, to be honest, it is probably slightly slower, and definitely harder to understand. I would use the first solution.

  • Related