Is there a way to build CSV file out of command line key-value pair output with different separators-CodePudding

I've raw log snippets being dumped on the console as output for custom command:

bash$ custom-command
current-capacity: 3%, buffer: 1024, not-used/total: 10/10, IsEnabled: 0. Up since Thu Jun 23 11:54:14 2022
current-capacity: 0%, buffer: 1024, not-used/total: 25/25, IsEnabled: 0. Up since Thu Jun 23 11:54:14 2022
current-capacity: 0%, buffer: 1024, not-used/total: 15/15, IsEnabled: 1. Up since Thu Jun 23 11:54:14 2022

I need have CSV format like below to capture the status in real-time based on certain criterias, I can then redirect the output to CSV file at regular interval before loading into SQL database.

current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022

I've tried AWK but still facing issue since it's comma seperated for most part except IsEnabled: 0 ends with "." then the Uptime. Is there a way? I'm quite new to awk.

CodePudding user response：

Welcome to StackOverflow. Thank you for including both sample data and desired output. You are recommended to study the markdown formatting syntax here, since your code was entered as quoted HTML. It's better to use code tags. This will output fixed-width text and is easier to read.

As for your problem, you can use the match statement in gawk to capture all fields using regular expressions, because your input data is formatted same way.

Something like this will do the needed:

BEGIN{
   # set output separator to comma space
   OFS=", "
   
   # define the regular expression to capture needed
   # See https://regex101.com/r/K4wYoB/1
   #
   #   ([^,])   captures all until next comma, not including comma
   #   (.)      captures single character
   #   (.*)     at the end, captures remaining
   #
   #   did not use full words, since it was not needed.
   #   
   myregexp="y: ([^,]*).*?r: ([^,]*).*al: ([^,]*).*led: (.).*ce (.*)"
   
   # print header for output
   print "current-capacity, buffer, not-used/total, IsEnabled, Up since"
}

# loop lines. Skipping header line
NR>1{

   # capture data fields
   match($0, myregexp, a)
   
   # print the line from "a" array
   print a[1], a[2], a[3], a[4], a[5]
}

CodePudding user response：

It's just writing a regex that matches the output and transforming it.

sed -E 's/current-capacity: (.*)%, buffer: (.*), not-used/total: (.*), IsEnabled: (.*). Up since (.*)/\1%,\2,\3,\4/'

CodePudding user response：

awk '
    BEGIN{re = "^([^:]*): ?([^,]*), ?([^:]*): ?([^,]*), ?([^:]*): ?([^,]*), ?([^:]*): ?(.*)$"}
    NR==1{
        print gensub(re, "\\1,\\3,\\5,\\7", 1)
    }
    {
        print gensub(re, "\\2,\\4,\\6,\\8", 1) 
    }
' file

current-capacity,buffer,not-used/total,IsEnabled
3%,1024,10/10,0. Up since Thu Jun 23 11:54:14 2022
0%,1024,25/25,0. Up since Thu Jun 23 11:54:14 2022
0%,1024,15/15,1. Up since Thu Jun 23 11:54:14 2022

CodePudding user response：

Using any awk in any shell on every Unix box:

$ cat tst.awk
BEGIN { FS="[:,] "; OFS=", " }
match($0,/\. [^ ]  [^ ] /) {
    $0 = substr($0,1,RSTART-1) "," substr($0,RSTART 1,RLENGTH-1) ":" substr($0,RSTART RLENGTH)
}
NR == 1 {
    for ( i=1; i<NF; i =2 ) {
        printf "%s%s", $i, (i<(NF-1) ? OFS : ORS)
    }
}
{
    for ( i=2; i<=NF; i =2 ) {
        printf "%s%s", $i, (i<NF ? OFS : ORS)
    }
}

$ awk -f tst.awk file
current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/25, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 15/15, 1, Thu Jun 23 11:54:14 2022

In the above the first step is using match() { ... } to make the . Up since Thu field at the end of each input line use the same , and : separators as the rest of the input , Up since: Thu so the rest of the code parsing the now consistent input is easy.