Home > Software design >  Groovy: compare two lazy maps/jsons
Groovy: compare two lazy maps/jsons

Time:11-05

I have two jsons/lazy maps in the format as shown below. I now need to compare them to find if there is any difference between them. The reason I combine each set of values in a string so that the comparison becomes faster as my actual inputs (i.e. json messages) are going to be really large.

reqJson:

[["B1": 100, "B2": 200, "B3": 300, "B4": 400],["B1": 500, "B2": 600, "B3": 700, "B4": 800], ["B1": 900, "B2": 1000, "B3": 2000, "B4": 3000], ["B1": 4000, "B2": 5000, "B3": 6000, "B4": 7000]]

respJson:

[["B1": 100, "B2": 200, "B3": 300, "B4": 400],["B1": 500, "B2": 600, "B3": 700, "B4": 800], ["B1": 900, "B2": 1000, "B3": 2000, "B4": 3000], ["B1": 4000, "B2": 5000, "B3": 6000, "B4": 7000], ["B1": 8000, "B2": 9000, "B3": 10000, "B4": 11000]]

My code looks something like as shown below but somehow I am unable to get the desired result. I am unable to figure out what is going wrong. I am taking each value from response Json and compare it with any value in request-Json to find if there is a difference or not.

def diffCounter = 0
Set diffSet = []

    respJson.each { respJ ->
                            reqJson.any {
                                        reqJ ->
                                        if (respJ.B1 respJ.B2 respJ.B3 respJ.B4 != reqJ.B1 reqJ.B2 reqJ.B3 reqJ.B4) {

                                            diffCounter  = 1
                                            diffSet << [
                                                "B1" : respJ.B1,
                                                "B2" : respJ.B2,
                                                "B3" : respJ.B3,
                                                "B4" : respJ.B4
                                            ]
                                   
                                        }
                                
                            }

    }
println ("Difference Count: "  diffCounter)
println ("Difference Set: "  diffSet)

Actual Output:

Difference Count: 5
Difference Set: [[B1:100, B2:200, B3:300, B4:400], [B1:500, B2:600, B3:700, B4:800], [B1:900, B2:1000, B3:2000, B4:3000], [B1:4000, B2:5000, B3:6000, B4:7000], [B1:8000, B2:9000, B3:10000, B4:11000]]

Expected Output:

Difference Count: 1
Difference Set: [["B1": 8000, "B2": 9000, "B3": 10000, "B4": 11000]]

NOTE: It can also happen that the request-json is bigger than the response-json so in that case I need to store the difference obtained from request-json into the diffSet.

Any inputs/suggestions in this regard will be helpful.

CodePudding user response:

As @daggett mentioned, if your JSONs become more nested/complicated, you will want to use a library to do this job for you.

In your use case of pure Lists of elements (with values that can be concatenated/added to form a unique key for that element) there is no problem with doing it 'manually'.

The problem with your code is that you check if any reqJson entry has a different count, which for 2 different reqJson entries is always true.

What you really want to check is if there any matching reqJson entry that has the same count. And if you can't find any matching entry, then you know that entry only exists in respJson.

def diffCounter = 0
Set diffSet = []

respJson.each { respJ ->
    def foundMatching = reqJson.any { reqJ ->
        respJ.B1   respJ.B2   respJ.B3   respJ.B4 == reqJ.B1   reqJ.B2   reqJ.B3   reqJ.B4
    }
    if (!foundMatching) {
        diffCounter  = 1
        diffSet << [
                "B1" : respJ.B1,
                "B2" : respJ.B2,
                "B3" : respJ.B3,
                "B4" : respJ.B4
        ]
    }
}
println ("Difference Count: "  diffCounter)
println ("Difference Set: "  diffSet)

You mention that reqJson can become bigger than respJson and that in that case you want to switch the roles of the two arrays in the comparison, so that you always get the unmatched elements from the larger array. A trick to do this is to start by swapping the two variables around.

if (reqJson.size() > respJson.size()) {
    (reqJson, respJson) = [respJson, reqJson]
}

Note that the time complexity of this algorithm is O(m * n * 2i), meaning it grows linearly with the multiplication of the sizes of the two arrays (m and n, here 5 and 4), times the count of property accesses we do every loop on both elements (i for both elements, here 4 because there are 4 Bs), because we potentially check each element of the smaller array one time for each element of the bigger array.

So if the arrays are tens of thousands of elements long, this will become very slow. A simple way to speed it up to O(m * i n * i) would be to:

  1. make a Set smallArrayKeys out of the concatenates messages/added values of the smaller array
  2. iterate through the bigger array, check if it's concatenated message is contained in the smallArrayKeys Set, and if not then it only exists in the bigger array.
  • Related