I am trying to take a log file containing chat logs and manipulate the data into a tabular format/csv for inserting into a database in a scheduled job. This is simple to do manually in something like Excel with formulas but I can't translate this to a Linux shell script that I can set on a schedule and leave alone.
My primary challenge is adding values to custom columns based on the contents of the log. I've looked into Awk but it just isn't registering how to get the desired output. If anyone could provide some guidance on the easiest way to go about this, it would be greatly appreciated!
Log example:
[05-05-2022 @ 00:22:41.533 GMT] autouser9218: @usernickname3291, Current commands in use are !upd, !aggregate, !grant
[05-05-2022 @ 00:22:41.545 GMT] usernickname3291: !commands
[05-05-2022 @ 12:49:44.067 GMT] usernickname3291: /me !upd aggregate
[05-05-2022 @ 12:49:49.534 GMT] autouser9218: /me - Total: 4
[05-05-2022 @ 12:49:49.537 GMT] usernickname3291: !upd aggregate
[05-05-2022 @ 13:33:17.203 GMT] autouser9218: !upd aggregate
[05-05-2022 @ 13:33:29.060 GMT] autouser9218: /me - Total: 4
[05-05-2022 @ 13:33:29.062 GMT] usernickname3291: !upd aggregate
[05-05-2022 @ 14:24:41.771 GMT] autouser9218: /me - Total: 4
[05-05-2022 @ 14:24:52.956 GMT] autouser9218: /me - Users: 3
[05-05-2022 @ 14:49:31.627 GMT] autouser9218: Currently we're operating at 18% capacity.
[05-05-2022 @ 14:49:31.644 GMT] usernickname3291: !grant
My desired tabular output is as follows:
| Date | Time | User | Message | Used Command | Command |
| ---------- | ------------ | ---------------- | ----------------------------------------------------------------------- | ------------ | -------------- |
| 05-05-2022 | 00:22:41.533 | autouser9218 | @usernickname3291, Current commands in use are !upd, !aggregate, !grant | FALSE | |
| 05-05-2022 | 00:22:41.545 | usernickname3291 | !commands | TRUE | !commands |
| 05-05-2022 | 12:49:44.067 | usernickname3291 | /me !upd aggregate | FALSE | |
| 05-05-2022 | 12:49:49.534 | autouser9218 | /me - Total: 4 | FALSE | |
| 05-05-2022 | 12:49:49.537 | usernickname3291 | !upd aggregate | TRUE | !upd aggregate |
| 05-05-2022 | 13:33:17.203 | autouser9218 | !upd aggregate | TRUE | !upd aggregate |
| 05-05-2022 | 13:33:29.060 | autouser9218 | /me - Total: 4 | FALSE | |
| 05-05-2022 | 13:33:29.062 | usernickname3291 | !upd aggregate | TRUE | !upd aggregate |
| 05-05-2022 | 14:24:41.771 | autouser9218 | /me - Total: 4 | FALSE | |
| 05-05-2022 | 14:24:52.956 | autouser9218 | /me - Users: 3 | FALSE | |
| 05-05-2022 | 14:49:31.627 | autouser9218 | Currently we're operating at 18% capacity. | FALSE | |
| 05-05-2022 | 14:49:31.644 | usernickname3291 | !grant | TRUE | !grant |
The date, time, user, and message columns I think are self explanatory. The other two are based on criteria. Used Command is TRUE if the first character of the message is an exclamation point ! Command is intended to be populated with the rest of the string that started with the exclamation point and the first word that follows (if one exists) after a space.
CodePudding user response:
Since the input is not delimited, consider using the "match" with regular expression to split it into fields (date, time, user, message), and then print the output. Something like:
#! /bin/env awk
# helper Function to extract the match field.
function x (field) {
return substr($0, a[field, "start"], a[field, "length"])
}
BEGIN {
OFS = " | "
print "Date ", "Time ", "User ", "Message"
print "--------", "---------", "-------", "-------"
}
match($0, "^\\[(..-..-....) @ (..:..:..\\....) GMT\\] (\\w ): (.*)$", a) {
print x(1), x(2), x(3), x(4)
}
See the man page for 'awk' on how to format the output to your liking
CodePudding user response:
I would harness GNU AWK
for this task following, let file.txt
content be
[05-05-2022 @ 00:22:41.533 GMT] autouser9218: @usernickname3291, Current commands in use are !upd, !aggregate, !grant
[05-05-2022 @ 00:22:41.545 GMT] usernickname3291: !commands
[05-05-2022 @ 12:49:44.067 GMT] usernickname3291: /me !upd aggregate
[05-05-2022 @ 12:49:49.534 GMT] autouser9218: /me - Total: 4
[05-05-2022 @ 12:49:49.537 GMT] usernickname3291: !upd aggregate
[05-05-2022 @ 13:33:17.203 GMT] autouser9218: !upd aggregate
[05-05-2022 @ 13:33:29.060 GMT] autouser9218: /me - Total: 4
[05-05-2022 @ 13:33:29.062 GMT] usernickname3291: !upd aggregate
[05-05-2022 @ 14:24:41.771 GMT] autouser9218: /me - Total: 4
[05-05-2022 @ 14:24:52.956 GMT] autouser9218: /me - Users: 3
[05-05-2022 @ 14:49:31.627 GMT] autouser9218: Currently we're operating at 18% capacity.
[05-05-2022 @ 14:49:31.644 GMT] usernickname3291: !grant
then
awk 'BEGIN{FS="\t"}{sub("\\[","");sub(" @ ",FS);sub(" GMT\\] ",FS);sub(": ",FS);printf "%s | %s | %-20s | %-80s | %-5s | %-15s |\n",$1,$2,$3,$4,index($4,"!")==1?"TRUE":"FALSE",index($4,"!")==1?$4:""}' file.txt
output
05-05-2022 | 00:22:41.533 | autouser9218 | @usernickname3291, Current commands in use are !upd, !aggregate, !grant | FALSE | |
05-05-2022 | 00:22:41.545 | usernickname3291 | !commands | TRUE | !commands |
05-05-2022 | 12:49:44.067 | usernickname3291 | /me !upd aggregate | FALSE | |
05-05-2022 | 12:49:49.534 | autouser9218 | /me - Total: 4 | FALSE | |
05-05-2022 | 12:49:49.537 | usernickname3291 | !upd aggregate | TRUE | !upd aggregate |
05-05-2022 | 13:33:17.203 | autouser9218 | !upd aggregate | TRUE | !upd aggregate |
05-05-2022 | 13:33:29.060 | autouser9218 | /me - Total: 4 | FALSE | |
05-05-2022 | 13:33:29.062 | usernickname3291 | !upd aggregate | TRUE | !upd aggregate |
05-05-2022 | 14:24:41.771 | autouser9218 | /me - Total: 4 | FALSE | |
05-05-2022 | 14:24:52.956 | autouser9218 | /me - Users: 3 | FALSE | |
05-05-2022 | 14:49:31.627 | autouser9218 | Currently we're operating at 18% capacity. | FALSE | |
05-05-2022 | 14:49:31.644 | usernickname3291 | !grant | TRUE | !grant |
Explanation: Firstly I inform GNU AWK
that field separator (FS
) is tab character. Then I do wrangle each line following way using sub
function to act at first occurence of:
[
is removed@
replaced using field separatorGMT]
replaced using field separator:
replaced using field separator
after such treatment $1
is date, $2
is time, $3
is username, $4
is message. I use printf
to output these in desired format: $1
and $2
have fixed widths so I simply use %s
, $3
is variable length and need justing so I use %-20s
to get filling with spaces after text to width of 20 character (left justing), $4
is similarly just but we need more width (I elected 80 as it is number of columns in Hollerith card). To get values for 2 last columns I used so called ternary operator condition?
valueiftrue:
valueiffalse, condition is same in both and might be read as: is first occurence of !
at 1
st position in 4th column (message)? If it is so use respectively TRUE
(string) and whole 4th column otherwise FALSE
(string) and (empty string). Note that width of column might need adjusting, so longest value would fit.
Disclaimer: this solution assumes that there is not tab character anywhere in file.txt
, if it is replace FS
using any single character which is not present in file.txt
(tested in gawk 4.2.1)