I've raw log snippets being dumped on the console as output for custom command:
bash$ custom-command
current-capacity: 3%, buffer: 1024, not-used/total: 10/10, IsEnabled: 0. Up since Thu Jun 23 11:54:14 2022
current-capacity: 0%, buffer: 1024, not-used/total: 25/25, IsEnabled: 0. Up since Thu Jun 23 11:54:14 2022
current-capacity: 0%, buffer: 1024, not-used/total: 15/15, IsEnabled: 1. Up since Thu Jun 23 11:54:14 2022
I need have CSV format like below to capture the status in real-time based on certain criterias, I can then redirect the output to CSV file at regular interval before loading into SQL database.
current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
I've tried AWK but still facing issue since it's comma seperated for most part except IsEnabled: 0 ends with "." then the Uptime. Is there a way? I'm quite new to awk.
CodePudding user response:
Welcome to StackOverflow. Thank you for including both sample data and desired output. You are recommended to study the markdown formatting syntax here, since your code was entered as quoted HTML. It's better to use code tags. This will output fixed-width text and is easier to read.
As for your problem, you can use the match
statement in gawk
to capture all fields using regular expressions, because your input data is formatted same way.
Something like this will do the needed:
BEGIN{
# set output separator to comma space
OFS=", "
# define the regular expression to capture needed
# See https://regex101.com/r/K4wYoB/1
#
# ([^,]) captures all until next comma, not including comma
# (.) captures single character
# (.*) at the end, captures remaining
#
# did not use full words, since it was not needed.
#
myregexp="y: ([^,]*).*?r: ([^,]*).*al: ([^,]*).*led: (.).*ce (.*)"
# print header for output
print "current-capacity, buffer, not-used/total, IsEnabled, Up since"
}
# loop lines. Skipping header line
NR>1{
# capture data fields
match($0, myregexp, a)
# print the line from "a" array
print a[1], a[2], a[3], a[4], a[5]
}
CodePudding user response:
It's just writing a regex that matches the output and transforming it.
sed -E 's/current-capacity: (.*)%, buffer: (.*), not-used/total: (.*), IsEnabled: (.*). Up since (.*)/\1%,\2,\3,\4/'
CodePudding user response:
awk '
BEGIN{re = "^([^:]*): ?([^,]*), ?([^:]*): ?([^,]*), ?([^:]*): ?([^,]*), ?([^:]*): ?(.*)$"}
NR==1{
print gensub(re, "\\1,\\3,\\5,\\7", 1)
}
{
print gensub(re, "\\2,\\4,\\6,\\8", 1)
}
' file
current-capacity,buffer,not-used/total,IsEnabled
3%,1024,10/10,0. Up since Thu Jun 23 11:54:14 2022
0%,1024,25/25,0. Up since Thu Jun 23 11:54:14 2022
0%,1024,15/15,1. Up since Thu Jun 23 11:54:14 2022
CodePudding user response:
Using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS="[:,] "; OFS=", " }
match($0,/\. [^ ] [^ ] /) {
$0 = substr($0,1,RSTART-1) "," substr($0,RSTART 1,RLENGTH-1) ":" substr($0,RSTART RLENGTH)
}
NR == 1 {
for ( i=1; i<NF; i =2 ) {
printf "%s%s", $i, (i<(NF-1) ? OFS : ORS)
}
}
{
for ( i=2; i<=NF; i =2 ) {
printf "%s%s", $i, (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file
current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/25, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 15/15, 1, Thu Jun 23 11:54:14 2022
In the above the first step is using match() { ... }
to make the . Up since Thu
field at the end of each input line use the same ,
and :
separators as the rest of the input , Up since: Thu
so the rest of the code parsing the now consistent input is easy.