I'm working on a script to organize some log files into a human readable format. Here's what my code does so far:
-takes in external log data
-takes in an IP address as an argument
-loops through each log
-if it reaches a log containing the specified IP {
-find the MSD number within that log
-check if MSD is in "collected" array
-if MSD is already in array {
-resume the loop
} else {
-add MSD to the "collected" array
****-search all logs for corresponding MSD and echo them to output.txt
-does not affect the order in which the logs were generated
}
-add a "--------" visual separator
-resume loop, repeating the process each time it finds the IP with a new MSD
Here's my code:
OUTPUT_FILE=./output.txt
ARR=()
while IFS= read -r line; do
if grep -q "$1" <<< "$line"; then
MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
if [[ ! " ${ARR[*]} " =~ " ${MSD} " ]] then
ARR =($MSD)
cat ./example_logs | grep $MSD >> $OUTPUT_FILE
echo "------------------" >> $OUTPUT_FILE
fi
fi
done < ./example_logs
Here is a snippet of the current output:
#EXAMPLE LOGS (target IP is 3x.x.xx.xx)
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
My issue is where I marked **** (the cat line)... From the output, you can see that there are two IPs that share the same MSD. I don't know how to go about removing the logs that I don't need. Any suggestions? So far I've tried creating $START and $END variables, but I'm sure I'm not doing this properly...
TARGET="connection from $1"
START="$(grep "$MSD" example_logs | grep "$TARGET")"
END="$(grep "$MSD" example_logs | grep -i "exiting")"
CodePudding user response:
Here is the janky solution I came up with after 3 cups of coffee & 15 hours of coding. I hope it might be useful to someone else out there!
rm ./output.txt
OUTPUT_FILE=./output.txt
#
#
# ================= PART 1: =================
# An empty array for values later
MSD_ARR=()
#
# Loop through each log in ./output_logs
# -IF it reaches a log containing the target IP {
# -find msd# within that log & save to variable
# -check if current msd is in MSD_ARR
#
# -IF current msd is NOT in the array {
# -add msd to MSD_ARR
# -grep all logs containing the current msd
# -output logs to new file. Does not affect the order in which logs are listed.
# } else {
# -resume the loop.
# }
# }
#
while IFS= read -r line; do
if grep -q "$1" <<< "$line"; then
MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
if [[ ! " ${MSD_ARR[*]} " =~ " ${MSD} " ]]; then
MSD_ARR =($MSD)
cat ./example_logs | grep $MSD >> $OUTPUT_FILE
echo "------------------------" >> $OUTPUT_FILE
fi
fi
done < ./example_logs
#
#
# ================= PART 2: =================
#
# Loop through each log in ./output.txt
# -IF it reaches a log containing the phrase "connection from" {
# -find msd# within that log & save to variable
# -IF the current log does NOT contain the target IP {
# -grab the current line's position & save as a variable called "START"
# -grab all positions of lines containing the phrase "Exiting" & split the values into an array called "EndArr"
#
# -FOR LOOP over EndArr
# -IF our START position is less than (comes before) our current END position (EndVal) {
# use "sed" to delete logs, using START and EndVal as the range. Deletes inclusively
# BREAK out of this loop to repeat the process with the next START value
# }
# }
# }
#
while IFS= read -r line; do
if grep -q "connection from" <<< "$line"; then
MSD="$(echo "$line" | cut --complement -d '[' -f 1 | cut -d ']' -f 1)"
if [[ ! " ${line} " =~ " ${1} " ]]; then
START="$(awk -v var="$line" 's=index($0,var){print NR}' output.txt)"
END="$(awk -v var="Exiting" 'e=index($0,var){print NR}' output.txt)"
EndArr=($END)
for EndVal in "${EndArr[@]}"; do
if [ "$START" -lt "$EndVal" ]; then
echo "$START"
echo "$EndVal"
sed -i "$START,$EndVal d" ./output.txt
break
fi
done
echo "-----"
fi
fi
done < ./output.txt
#
# ===========================================
# EXAMPLE USE:
# >bash filename [IP Address]
CodePudding user response:
Text processing with bash isn't exactly a good practice, and calling external tools inside shell loops should be avoided when possible.
Check this little awk
script, it might do what your looking for (I'm not entirely sure though, because there's no input sample in your question):
awk -F'[[\\]]' -v ip="$1" '
$0 ~ "from "ip" " { hash[$2] } # add MSD to hash
$2 in hash; # print line if MSD in hash
/: Exiting/ { delete hash[$2] } # delete MSD from hash
' ./example_logs
CodePudding user response:
What you are trying to do can be done with a simple call to awk
and three rules. (each rule is specified as condition { commands }
) Without a condition
the rule is run for every record (input line) Essentially all you need to do is:
- get the 7-digit
msd
from the 5th field (e.g.msd[xxxxxxx]
); - if not the first line and the msd isn't the same as the last, output your
"------------------"
separator - output the current line, and update your
last
variable with the currentmsd
To save to your "$OUTPUT_FILE"
, just redirect the output of the command.
If you put that together in a short awk
script, you have:
awk '
{ # separate digits from msd[xxx] and save as msd
# set RSTART RLENGHT (index and length of digits)
match ($5,/[[:digit:]] /)
msd = substr($5,RSTART,RLENGTH) # assign substring of digits to msd
}
FNR > 1 && last != msd { # if line > 1 and msd has changed
print "------------------"
}
{
print # output line
last = msd # update last with msd
}
' file > "$OUTPUT_FILE"
Example Use/Output
Since you only have one msd
in your example data, there is no change in msd
to catch. For example purposes the data has been duplicated and the msd
changed, e.g.
$ cat file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)
Now running the script produces the following with your separator at the msd
transition:
$ awk '
> { # separate digits from msd[xxx] and save as msd
> # set RSTART RLENGHT (index and length of digits)
> match ($5,/[[:digit:]] /)
> msd = substr($5,RSTART,RLENGTH) # assign substring of digits to msd
> }
> FNR > 1 && last != msd { # if line > 1 and msd has changed
> print "------------------"
> }
> {
> print # output line
> last = msd # update last with msd
> }
> ' file
Apr 12 01:04:20 fe1 msd[2406939]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406939]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406939]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406939]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406939]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406939]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406939]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406939]: Exiting (bytes in: 149 out: 389)
------------------
Apr 12 01:04:20 fe1 msd[2406940]: SMTPD started: connection from 173.xxx.xxx.xx
Apr 12 01:04:20 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 01:04:20 fe1 msd[2406940]: Session ending: Client was hardblocked
Apr 12 01:04:20 fe1 msd[2406940]: Exiting (bytes in: 14 out: 90)
Apr 12 04:04:34 fe1 msd[2406940]: SMTPD started: connection from 3x.x.xx.xx
Apr 12 04:04:34 fe1 msd[2406940]: Created UUID ....... for message
Apr 12 04:04:35 fe1 msd[2406940]: CONNECTED using SSL
Apr 12 04:04:37 fe1 msd[2406940]: Session ending: Client issued QUIT
Apr 12 04:04:37 fe1 msd[2406940]: Exiting (bytes in: 149 out: 389)
It's not entirely clear from your question what you want to use TARGET
, START
and END
for, so if I've misinterpreted what you were trying to do, update your question with further explanation and drop a comment below.
Using awk
for this instead of a shell script and loop will be Orders of Magnitude more efficient. (with differences as big as a few seconds with awk
compared to a few hours of runtime with the shell script for large logs)