Home > Mobile >  Bash Diff of Nested Files
Bash Diff of Nested Files

Time:08-02

Premise:

I have five files in two directories.

folder/
├─ old/
│  ├─ a.json
│  ├─ b.xml
├─ new/
│  ├─ a.json
│  ├─ b.xml
│  ├─ c.html

old/a.json

{
   "hello": {
   },
   "world": ""
}

new/a.json

{
   "hello": {},
   "world": ""
}

old/b.xml and new/b.xml are the same.

When I run diff I get:


2,3c2
<       "hello": {
<       }, 
---
>       "hello": {},

As well as the new file, c.html.


Solution:

I want to only see that c.html is the new file added. I want to ignore the newline/spaces in the two a.json.

Ideally I'd like to do diff -I '${REGEX_HERE}' folder/old folder/new to accomplish this. Is this possible? I also have other bash utilities at my disposal. This is meant to run in a Dockerfile.

CodePudding user response:

Run jq on each json file to produce a common format of output then diff THAT:

diff <(jq . folder/old/a.json) <(jq . folder/new/a.json)

For example:

$ head *.json
==> x.json <==
{
   "hello": {
   },
   "world": ""
}

==> y.json <==
{
   "hello": {},
   "world": ""
}

$ jq . x.json
{
  "hello": {},
  "world": ""
}

$ jq . y.json
{
  "hello": {},
  "world": ""
}

$ diff <(jq . x.json) <(jq . y.json)
$

To do what you asked for I want to ignore the newline/spaces in the two a.json would be:

$ diff <(tr -d '[[:space:]]' < x.json) <(tr -d '[[:space:]]' < y.json)
$

but that assumes your version of diff works on input files that don't have a terminating newline and so aren't valid text files per POSIX, and that you're OK with ALL white space being removed, even inside quotes, and that you don't care about other layout differences between the 2 files.

I expect you'll run into the same problem of wanting to ignore some of the white space and/or other formatting possibilities in xml and other files so you'd have to write a tool something like this to be able to diff the 2 directories as you appear to want (untested):

#!/usr/bin/env bash

readarray -d '' files < <(find folder -type f -printf '%P\0' | sort -zu)

diffByType() {
    case $1 in
        *.json ) diff <(jq . "$1") <(jq . "$2") >&2 ;;
        *.xml )  diff <(xmlstarlet fo "$1") <(xmlstarlet fo "$2") >&2 ;;
        * )      diff "$1" "$2" >&2 ;;
    esac
    return
}

for file in "${files[@]}"; do
    if [[ -f "folder/old/$file" ]]; then
        if [[ -f "folder/new/$file" ]]; then
            if ! diffByType "folder/old/$file" "folder/new/$file"; then
                printf '%s is different\n' "$file" >&2
            fi
        else
            printf '%s is only in old\n' "$file" >&2
        fi
    else
        printf '%s is only in new\n' "$file" >&2
    fi
done
  • Related