Home > Blockchain >  Bash script to run diff command on files in two directories and provide custom side by side output?
Bash script to run diff command on files in two directories and provide custom side by side output?

Time:01-01

EDIT - Reproducible Code, Output, and Updated Question

#!/bin/bash

# Input Directory
inpd="/home/space user/space test/space dir1"

# Line To Parse
line="/home/space user/space test/space dir1: space file 1.txt"

# Split Line awk -F[:]
echo ""
var1=$(echo "$line" | awk -F[:] '{print $1}')
echo "  echo var1"
echo "  $var1"
echo ""
echo "  printf var1"
printf "%-2s%s" "" $var1

# var1 == inpd
echo ""
echo ""
echo "  var1 == inpd"
if [ var1 == inpd ]; then
  printf "  Match."
else
  printf "  No match."
fi
echo ""

$ scriptname

  echo var1
  /home/space user/space test/space dir1

  printf var1
  /home/spaceuser/spacetest/spacedir1

  var1 == inpd
  No match.

Updated Question - How to define, cast, or properly compare var1 to inpd so it produces a match when the input has spaces? If there is a better way to find the match without calling awk it would also solve my problem.

I found the clue to solve my question here:

How can I remove all text after a character in bash?

$ script - this gives a Match!

#!/bin/bash

# Input Directory
inpd="/home/space user/space test/space dir1"

# Line To Parse
line="/home/space user/space test/space dir1: space file 1.txt"

# var1 keeps everything in 'line' before :
var1=${line%:*}
echo ""
echo "$line"
echo "$var1"
printf "$var1"

# "$var1" == "$inpd"
echo ""
if [ "$var1" == "$inpd" ]; then
  printf "  Match."
else
  printf "  No match."
fi
echo ""

EDIT - Why the Long Post?

I made a long post to show my script development effort but the question now reduces to an effort to match any /path with/ or without spaces/dir1 to the same path string or variable extracted from the output lines of the diff command. I am using awk with -F[:] as the separator but there may be an alternative way to do it. I tried to embed some reproducible code above and below with the description Reproducible Code. The updated question should be based on the above edit, and the long post is to preserve the new context and the original post.

For my use cases the custom script is non-recursive; it would handle spaces in the path or filenames; but as of now it would generate errors for any path or filename containing a colon : character and also for any filename containing a slash /. I am not sure what other characters or sequences would produce an error and I don't need a more robust script for my present purposes.

Spaces in any input path it must be contained in quotes dirt "/path with spaces/dir1".

So far I think if subdirectories appear in only one directory, as shown in my test directory structure, then in the absence of file extensions there is no way to determine whether the name refers to a file or subdirectory. I intend to use tree to list directories with color to show files and subdirectories and also use the new script dirt to compare files that are the same or different. This will probably work best for directories with few files and not many subdirectories which is my intended use case.

EDIT - Desired Output Format (Script Name dirt Using Test Directories Below)

$ dirt "/home/joe/test dirdiff/dir1" "/home/joe/test dirdiff/dir2"

BOTH    /home/joe/test dirdiff/dir1               /home/joe/test dirdiff/dir2
diff    diff.txt                                  diff.txt
        diffout.txt
        only1.txt
                                                  only2.txt
same    same space.txt                            same space.txt
same    same.txt                                  same.txt
        space 1.txt
                                                  space 2.txt
        subdir1
                                                  subdir2
comd    subdirC                                   subdirC

EDIT - Directory Structure With Spaces (Without :) To Test Script

/home/joe/test dirdiff
├── dir1
│   ├── diff.txt
│   ├── diffout.txt
│   ├── only1.txt
│   ├── same space.txt
│   ├── same.txt
│   ├── space 1.txt
│   ├── subdir1
│   └── subdirC
└── dir2
    ├── diff.txt
    ├── only2.txt
    ├── same space.txt
    ├── same.txt
    ├── space 2.txt
    ├── subdir2
    └── subdirC

EDIT - Output from running diff

$ diff -qs "/home/joe/test dirdiff/dir1" "/home/joe/test dirdiff/dir2"

Files /home/joe/test dirdiff/dir1/diff.txt and /home/joe/test dirdiff/dir2/diff.txt differ
Only in /home/joe/test dirdiff/dir1: diffout.txt
Only in /home/joe/test dirdiff/dir1: only1.txt
Only in /home/joe/test dirdiff/dir2: only2.txt
Files /home/joe/test dirdiff/dir1/same space.txt and /home/joe/test dirdiff/dir2/same space.txt are identical
Files /home/joe/test dirdiff/dir1/same.txt and /home/joe/test dirdiff/dir2/same.txt are identical
Only in /home/joe/test dirdiff/dir1: space 1.txt
Only in /home/joe/test dirdiff/dir2: space 2.txt
Only in /home/joe/test dirdiff/dir1: subdir1
Only in /home/joe/test dirdiff/dir2: subdir2
Common subdirectories: /home/joe/test dirdiff/dir1/subdirC and /home/joe/test dirdiff/dir2/subdirC

EDIT - Script Fragment dirt00 Stores diff Output in $diffout

  #!/bin/bash
  if [[ -z "$1" || -z "$2" ]]; then
    printf "\n  Type $ dirt00 Dir1 Dir2\n"
  else
    input1="$1"
    input2="$2"
    diffout=$(diff -qs "$1" "$2")
    # Printf '%s\n' "$var" is necessary because printf '%s' "$var" on a
    # variable that doesn't end with a newline then the while loop will
    # completely miss the last line of the variable.
    while IFS= read -r line
      do
        echo $line
      done < <(printf '%s\n' "$diffout")
  fi

EDIT - Output from running dirt00

$ dirt00 "/home/joe/test dirdiff/dir1" "/home/joe/test dirdiff/dir2"

Files /home/joe/test dirdiff/dir1/diff.txt and /home/joe/test dirdiff/dir2/diff.txt differ
Only in /home/joe/test dirdiff/dir1: diffout.txt
Only in /home/joe/test dirdiff/dir1: only1.txt
Only in /home/joe/test dirdiff/dir2: only2.txt
Files /home/joe/test dirdiff/dir1/same space.txt and /home/joe/test dirdiff/dir2/same space.txt are identical
Files /home/joe/test dirdiff/dir1/same.txt and /home/joe/test dirdiff/dir2/same.txt are identical
Only in /home/joe/test dirdiff/dir1: space 1.txt
Only in /home/joe/test dirdiff/dir2: space 2.txt
Only in /home/joe/test dirdiff/dir1: subdir1
Only in /home/joe/test dirdiff/dir2: subdir2
Common subdirectories: /home/joe/test dirdiff/dir1/subdirC and /home/joe/test dirdiff/dir2/subdirC

EDIT - Reproducible Code Script dirt01

#!/bin/bash
input1="/home/joe/test dirdiff/dir1"
input2="/home/joe/test dirdiff/dir2"
diffout="Files /home/joe/test dirdiff/dir1/diff.txt and /home/joe/test dirdiff/dir2/diff.txt differ
Only in /home/joe/test dirdiff/dir1: diffout.txt
Only in /home/joe/test dirdiff/dir1: only1.txt
Only in /home/joe/test dirdiff/dir2: only2.txt
Files /home/joe/test dirdiff/dir1/same space.txt and /home/joe/test dirdiff/dir2/same space.txt are identical
Files /home/joe/test dirdiff/dir1/same.txt and /home/joe/test dirdiff/dir2/same.txt are identical
Only in /home/joe/test dirdiff/dir1: space 1.txt
Only in /home/joe/test dirdiff/dir2: space 2.txt
Only in /home/joe/test dirdiff/dir1: subdir1
Only in /home/joe/test dirdiff/dir2: subdir2
Common subdirectories: /home/joe/test dirdiff/dir1/subdirC and /home/joe/test dirdiff/dir2/subdirC"
# Printf '%s\n' "$var" is necessary because printf '%s' "$var" on a
# variable that doesn't end with a newline then the while loop will
# completely miss the last line of the variable.
printf "\n  %-8s%-40s%-40s\n" "BOTH" "$input1" "$input2"
while IFS= read -r line
  do
    #echo $line
    firstword=$(echo "$line" | awk '{print $1}')
    finalword=$(echo "$line" | awk '{print $NF}')
    if   [ $finalword == "differ" ]; then
      snip=${line%" differ"}
      echo "$snip" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","diff",$NF,$NF}'
    elif [ $finalword == "identical" ]; then
      snip=${line%" are identical"}
      echo "$snip" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","same",$NF,$NF}'
    elif [ $firstword == "Common" ]; then
      echo "$line" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","comd",$NF,$NF}'
    else
      echo ""
    fi
  done < <(printf '%s\n' "$diffout")

EDIT - Output from running dirt01

$ dirt01

  BOTH    /home/joe/test dirdiff/dir1             /home/joe/test dirdiff/dir2
  diff    diff.txt                                diff.txt



  same    same space.txt                          same space.txt
  same    same.txt                                same.txt




  comd    subdirC                                 subdirC

I cannot write dirt02, to complete the script, without an answer to the updated question at the top of this post.

I left the original question and post below to preserve the context for the existing answer and comments which are greatly appreciated!

NOTE - Original Question and Post Below

In the two lines starting $NF=="differ" and $NF=="identicial":

(1) How do I split the file name and extension from the directory using either identical awk variable shown below as $2 or $4 and then output the filename.ext in the printf command?

dirdiff - bash script

  #!/bin/bash
  if [[ -z $1 || -z $2 ]]; then
    printf "\n  Type $ dirdiff Dir1 Dir2\n"
  else
    LEFT=$1
    LEFT =:
    RGHT=$2
    RGHT =:
    printf "\n  %-8s%-40s%-40s\n" "" "$1" "$2"
    printf "  %-8s%-40s%-40s\n\n" "" "$LEFT" "$RGHT"
    diff -qs $1 $2
    echo ""
    printf "\n%-8s%-40s%-40s\n" "INFO" "$1" "$2"
    diff -qs $1 $2 | awk -v L=$LEFT -v R=$RGHT \
                     '$NF=="differ" {printf "%-8s%-40s%-40s\n","diff", $2, $4} \
                      $NF=="identical" {printf "%-8s%-40s%-40s\n","same", $2, $4} \
                      $3==L {printf "%-8s%-40s\n","", $4} \
                      $3==R {printf "%-8s%-40s%-40s\n","", "", $4}'
  fi

This is the debug and develop script which runs command $ diff -qs $1 $2 twice. The first time shows the raw output. The second time pipes output to awk where I am trying to parse lines and format output on the command line. My questions relate to the final five lines in the script. EDIT: I solved the printf syntax problem in awk as shown in the code.

Run dirdiff on command line gives the following command line output

$ dirdiff /usr/local/adm/sys /mnt/ssdroot/home/joe/admin/sys

          /usr/local/adm/sys                      /mnt/ssdroot/home/joe/admin/sys
          /usr/local/adm/sys:                     /mnt/ssdroot/home/joe/admin/sys:

Only in /mnt/ssdroot/home/joe/admin/sys: bashrc.txt
Only in /usr/local/adm/sys: debpkgs.txt
Files /usr/local/adm/sys/direnv.txt and /mnt/ssdroot/home/joe/admin/sys/direnv.txt differ
Only in /usr/local/adm/sys: dpiDec2022.txt
Only in /mnt/ssdroot/home/joe/admin/sys: mypkgs.txt
Only in /mnt/ssdroot/home/joe/admin/sys: pyenv.txt
Files /usr/local/adm/sys/ssh.txt and /mnt/ssdroot/home/joe/admin/sys/ssh.txt are identical
Files /usr/local/adm/sys/usbquirks.txt and /mnt/ssdroot/home/joe/admin/sys/usbquirks.txt differ


INFO    /usr/local/adm/sys                      /mnt/ssdroot/home/joe/admin/sys
                                                bashrc.txt
        debpkgs.txt
diff    /usr/local/adm/sys/direnv.txt           /mnt/ssdroot/home/joe/admin/sys/direnv.txt
        dpiDec2022.txt
                                                mypkgs.txt
                                                pyenv.txt
same    /usr/local/adm/sys/ssh.txt              /mnt/ssdroot/home/joe/admin/sys/ssh.txt
diff    /usr/local/adm/sys/usbquirks.txt        /mnt/ssdroot/home/joe/admin/sys/usbquirks.txt

Desired Command Line Output Format (Duplicated at Top)

$ dirdiff /usr/local/adm/sys /mnt/ssdroot/home/joe/admin/sys

INFO    /usr/local/adm/sys                        /mnt/ssdroot/home/joe/admin/sys
                                                  bashrc.txt
        debpkgs.txt
diff    direnv.txt                                direnv.txt
        dpiDec2022.txt
                                                  mypkgs.txt
                                                  pyenv.txt
same    ssh.txt                                   ssh.txt
diff    usbquirks.txt                             usbquirks.txt

CodePudding user response:

Hope this helps. I think the sub function is what you are asking about for the basename function.

Good luck!

    diff -qs $1 $2 | gawk -v L=$1 -v R=$2 \
      'BEGIN { printf "\n%-8s%-40s%-40s\n", "INFO", L, R } \
         $NF=="differ" { sub( /.*\//,"",$4) ; printf "%-8s%-40s%-40s\n", "diff", $4, $4 } \
         $NF=="identical" { sub( /.*\//,"",$4) ; printf "%-8s%-40s%-40s\n", "same", $4, $4 } \
         $3==L":" { sub( /.*\//,"",$4) ; printf "%-8s%-40s%-40s\n", "only", $4, "" } \
         $3==R":" { sub( /.*\//,"",$4) ; printf "%-8s%-40s%-40s\n", "only", "", $4 } '
INFO    dir1                                    dir2                                    
only                                            bashrc.txt                              
only    debpkgs.txt                                                                     
diff    direnv.txt                              direnv.txt                              
only    dpiDec2022.txt                                                                  
only                                            mypkgs.txt                              
only                                            pyenv.txt                               
same    ssh.txt                                 ssh.txt                                 
diff    usbquirks.txt                           usbquirks.txt 

CodePudding user response:

Directory Structure With Spaces (Without :) To Test Script

/home/joe/test dirdiff
├── dir1
│   ├── diff.txt
│   ├── diffout.txt
│   ├── only1.txt
│   ├── same space.txt
│   ├── same.txt
│   ├── space 1.txt
│   ├── subdir1
│   └── subdirC
└── dir2
    ├── diff.txt
    ├── only2.txt
    ├── same space.txt
    ├── same.txt
    ├── space 2.txt
    ├── subdir2
    └── subdirC

Reproducible Script Works for Paths & Names Containing Spaces but Not Colons

#!/bin/bash
input1="/home/joe/test dirdiff/dir1"
input2="/home/joe/test dirdiff/dir2"
diffout="Files /home/joe/test dirdiff/dir1/diff.txt and /home/joe/test dirdiff/dir2/diff.txt differ
Only in /home/joe/test dirdiff/dir1: diffout.txt
Only in /home/joe/test dirdiff/dir1: only1.txt
Only in /home/joe/test dirdiff/dir2: only2.txt
Files /home/joe/test dirdiff/dir1/same space.txt and /home/joe/test dirdiff/dir2/same space.txt are identical
Files /home/joe/test dirdiff/dir1/same.txt and /home/joe/test dirdiff/dir2/same.txt are identical
Only in /home/joe/test dirdiff/dir1: space 1.txt
Only in /home/joe/test dirdiff/dir2: space 2.txt
Only in /home/joe/test dirdiff/dir1: subdir1
Only in /home/joe/test dirdiff/dir2: subdir2
Common subdirectories: /home/joe/test dirdiff/dir1/subdirC and /home/joe/test dirdiff/dir2/subdirC"
printf "\n  %-8s%-40s%-40s\n" "BOTH" "$input1" "$input2"
# Printf '%s\n' "$var" is necessary because printf '%s' "$var" on a
# variable that doesn't end with a newline then the while loop will
# completely miss the last line of the variable.
while IFS= read -r line
  do
    #echo $line
    firstword=$(echo "$line" | awk '{print $1}')
    finalword=$(echo "$line" | awk '{print $NF}')
    if   [[ "$finalword" == "differ" ]]; then
      snip=${line%" differ"}
      echo "$snip" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","diff",$NF,$NF}'
    elif [[ "$finalword" == "identical" ]]; then
      snip=${line%" are identical"}
      echo "$snip" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","same",$NF,$NF}'
    elif [[ "$firstword" == "Common" ]]; then
      echo "$line" | awk -F[/] '{printf "  %-8s%-40s%-40s\n","comd",$NF,$NF}'
    elif [[ "$firstword" == "Only" ]]; then
      snip=${line#"Only in "}
      mdir=${snip%:*}
      name=${snip#*:}
      name=${name# *}
      if [[ "$mdir" == "$input1" ]]; then
        printf "  %-8s%-40s\n" "" "$name"
      else
        printf "  %-8s%-40s%-40s\n" "" "" "$name"
      fi
    else
      echo ""
    fi
  done < <(printf '%s\n' "$diffout")

$ scriptname

  BOTH    /home/joe/test dirdiff/dir1             /home/joe/test dirdiff/dir2
  diff    diff.txt                                diff.txt
          diffout.txt
          only1.txt
                                                  only2.txt
  same    same space.txt                          same space.txt
  same    same.txt                                same.txt
          space 1.txt
                                                  space 2.txt
          subdir1
                                                  subdir2
  comd    subdirC                                 subdirC
  • Related