Printing lines between start & end variables in Bash-CodePudding

I'm working on a script to organize some log files into a human readable format. Here's what my code does so far:

-takes in external log data 
-takes in an IP address as an argument

-loops through each log
  -if it reaches a log containing the specified IP {
    -find the MSD number within that log
    -check if MSD is in "collected" array
  
    -if MSD is already in array {
      -resume the loop
    } else {
      -add MSD to the "collected" array
      ****-search all logs for corresponding MSD and echo them to output.txt
      -does not affect the order in which the logs were generated
    }
  -add a "--------" visual separator
  -resume loop, repeating the process each time it finds the IP with a new MSD

Here's my code:

OUTPUT_FILE=./output.txt
ARR=()
while IFS= read -r line; do
  if grep -q "$1" <<< "$line"; then
    MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
    if [[ ! " ${ARR[*]} " =~ " ${MSD} " ]] then
      ARR =($MSD)
      cat ./example_logs | grep $MSD >> $OUTPUT_FILE
      echo "------------------" >> $OUTPUT_FILE
    fi
  fi
done < ./example_logs

Here is a snippet of the current output:

#EXAMPLE LOGS (target IP is 3x.x.xx.xx)
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx 
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx 
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)

My issue is where I marked **** (the cat line)... From the output, you can see that there are two IPs that share the same MSD. I don't know how to go about removing the logs that I don't need. Any suggestions? So far I've tried creating $START and $END variables, but I'm sure I'm not doing this properly...

TARGET="connection from $1"
START="$(grep "$MSD" example_logs | grep "$TARGET")"
END="$(grep "$MSD" example_logs | grep -i "exiting")"

CodePudding user response：

Here is the janky solution I came up with after 3 cups of coffee & 15 hours of coding. I hope it might be useful to someone else out there!

rm ./output.txt
OUTPUT_FILE=./output.txt
#
#
# ================= PART 1: =================
# An empty array for values later
MSD_ARR=()
#
# Loop through each log in ./output_logs
#   -IF it reaches a log containing the target IP {
#       -find msd# within that log & save to variable
#       -check if current msd is in MSD_ARR
#
#       -IF current msd is NOT in the array {
#           -add msd to MSD_ARR
#           -grep all logs containing the current msd
#           -output logs to new file. Does not affect the order in which logs are listed.
#       } else {
#           -resume the loop.  
#       }
#    }
#
while IFS= read -r line; do
  if grep -q "$1" <<< "$line"; then
    MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
    if [[ ! " ${MSD_ARR[*]} " =~ " ${MSD} " ]]; then
      MSD_ARR =($MSD)
      cat ./example_logs | grep $MSD >> $OUTPUT_FILE
      echo "------------------------" >> $OUTPUT_FILE
    fi
  fi
done < ./example_logs
#
#
# ================= PART 2: =================
#
# Loop through each log in ./output.txt
#   -IF it reaches a log containing the phrase "connection from" {
#       -find msd# within that log & save to variable
#       -IF the current log does NOT contain the target IP {
#           -grab the current line's position & save as a variable called "START"
#           -grab all positions of lines containing the phrase "Exiting" & split the values into an array called "EndArr"
#
#           -FOR LOOP over EndArr
#           -IF our START position is less than (comes before) our current END position (EndVal) {
#               use "sed" to delete logs, using START and EndVal as the range. Deletes inclusively
#               BREAK out of this loop to repeat the process with the next START value
#           }
#       }
#   }
#
while IFS= read -r line; do
  if grep -q "connection from" <<< "$line"; then
    MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
    if [[ ! " ${line} " =~ " ${1} " ]]; then
      START="$(awk -v var="$line" 's=index($0,var){print NR}' output.txt)"
      END="$(awk -v var="Exiting" 'e=index($0,var){print NR}' output.txt)"
      EndArr=($END)
      for EndVal in "${EndArr[@]}"; do
        if [ "$START" -lt "$EndVal" ]; then
          echo "$START"
          echo "$EndVal"
          sed -i "$START,$EndVal d" ./output.txt
          break
        fi
      done
        echo "-----"
    fi
  fi
done < ./output.txt
#
# ===========================================
# EXAMPLE USE:
# >bash filename [IP Address]

CodePudding user response：

Text processing with bash isn't exactly a good practice, and calling external tools inside shell loops should be avoided when possible.

Check this little awk script, it might do what your looking for (I'm not entirely sure though, because there's no input sample in your question):

awk -F'[[\\]]' -v ip="$1" '
    $0 ~ "from "ip" " { hash[$2] }  # add MSD to hash
    $2 in hash;                     # print line if MSD in hash
    /: Exiting/ { delete hash[$2] } # delete MSD from hash
' ./example_logs

CodePudding user response：

What you are trying to do can be done with a simple call to awk and three rules. (each rule is specified as condition { commands }) Without a condition the rule is run for every record (input line) Essentially all you need to do is:

get the 7-digit msd from the 5th field (e.g. msd[xxxxxxx]);
if not the first line and the msd isn't the same as the last, output your "------------------" separator
output the current line, and update your last variable with the current msd

To save to your "$OUTPUT_FILE", just redirect the output of the command.

If you put that together in a short awk script, you have:

awk '
  { # separate digits from msd[xxx] and save as msd
    # set RSTART RLENGHT (index and length of digits)
    match ($5,/[[:digit:]] /)
    msd = substr($5,RSTART,RLENGTH)   # assign substring of digits to msd
  }
  FNR > 1 && last != msd {  # if line > 1 and msd has changed
    print "------------------"
  }
  {
    print           # output line
    last = msd      # update last with msd
  }
' file > "$OUTPUT_FILE"

Example Use/Output

Since you only have one msd in your example data, there is no change in msd to catch. For example purposes the data has been duplicated and the msd changed, e.g.

$ cat file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)

Now running the script produces the following with your separator at the msd transition:

$ awk '
>   { # separate digits from msd[xxx] and save as msd
>     # set RSTART RLENGHT (index and length of digits)
>     match ($5,/[[:digit:]] /)
>     msd = substr($5,RSTART,RLENGTH)   # assign substring of digits to msd
>   }
>   FNR > 1 && last != msd {  # if line > 1 and msd has changed
>     print "------------------"
>   }
>   {
>     print           # output line
>     last = msd      # update last with msd
>   }
> ' file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
------------------
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)

It's not entirely clear from your question what you want to use TARGET, START and END for, so if I've misinterpreted what you were trying to do, update your question with further explanation and drop a comment below.

Using awk for this instead of a shell script and loop will be Orders of Magnitude more efficient. (with differences as big as a few seconds with awk compared to a few hours of runtime with the shell script for large logs)