How to find differences between two json array columns in aws redshift?-CodePudding

I have a tableA in redshift as below

id	json1	json2	json3
1	[{"p":1,"p":2}]	[{"p":1}]
2	[{"p":3,"p":4}]	[{"p":3}]

I want to fill the json3 column with the difference of json1 and json2 ie for id1, json3 column is [{"p":2}] and for id2, json3 column is [{"p":4}]

Need help in this.

CodePudding user response：

You can use ObjctMapper and convert your Json's to string to find the differences,

ObjectMapper mapper = new ObjectMapper();
TypeReference<HashMap<String, Object>> type = 
    new TypeReference<HashMap<String, Object>>() {};

Map<String, Object> firstJsonMap = mapper.readValue(firstJsonElement, type);
Map<String, Object> secondJsonMap = mapper.readValue(secondJsonElement, type);

MapDifference<String, Object> difference = Maps.difference(firstJsonMap, secondJsonMap);

If you wish to get flat the extracted difference map for getting more meaningful result see here .

On the second way you can use JsonNode to finding the difference as follow (below example is just checking if they are same or not indeed)

JsonNode actualObj1 = mapper.readTree("your firstJson string");
JsonNode actualObj2 = mapper.readTree("your secondJson string");

TextNodeComparator cmp = new TextNodeComparator();


public class TextNodeComparator implements Comparator<JsonNode> 
{
    @Override
    public int compare(JsonNode o1, JsonNode o2) {
        if (o1.equals(o2)) {
            return 0;
        }
        if ((o1 instanceof TextNode) && (o2 instanceof TextNode)) {
            String s1 = ((TextNode) o1).asText();
            String s2 = ((TextNode) o2).asText();
            if (s1.equalsIgnoreCase(s2)) {
                return 0;
            }
        }
        return 1;
    }
}

CodePudding user response：

This is doable in SQL though a little complex. SQL operates on rows and json arrays are like virtual rows that have been folded up into one row. You'll need to expand these out and use SQL operators to identify the differences. Either JOIN or EXCEPT can find the extra elements (now rows). If you need these differences combined back into a json array that would be the last step.

1 - You'll need a set of sequential numbers (cte or table) that has as many values as you have elements in the max array. You can use this to expand the array into row similar to this - Extract value based on specific key from array of jsons in Amazon Redshift

2 - You'll need to do #1 for json1 and json2

3 - Outer Join these results on the individual values for the expanded json1 and json2 but only keep (where clause) when there is a mismatch

4 - If you need multiple differences to be combined back into json format you can use the listagg() to combine them