Home > database >  Format multiline output of grep command into columns adding/substituting filename as an output field
Format multiline output of grep command into columns adding/substituting filename as an output field

Time:01-26

I am attempting to format the output of a multi-line egrep query into CSV compatible format.

I need to grab a handful of values from a large list of files (some of which may not contain the value I am looking for)

grep command I am using is:

grep -e Name -e Type -e Schedule -e Pool -e Storage \*|awk -F' = '  '{print $1,$2}'|sort

This returns output like:

IRVLinuxDefault.cfg:  Name "IRVLinuxDefault"
IRVLinuxDefault.cfg:  Pool "IRV_DD890_Full60"
IRVLinuxDefault.cfg:  Schedule "IRV_Backups"
IRVLinuxDefault.cfg:  Storage "IRV_SD_DD890"
IRVLinuxDefault.cfg:  Type "Backup"
LVS_60Day_NDMP_Defs.cfg:  Name "LVS_60Day_NDMP_Defs"
LVS_60Day_NDMP_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Type "Backup"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Name "LVS_60Day_NDMP_NOFileSet_Defs"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Type "Backup"
LVS_Datalake2_Defs.cfg:  Name "LVS_Datalake2_Defs"
LVS_Datalake2_Defs.cfg:  Pool "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Schedule "WeeklyCycle"
LVS_Datalake2_Defs.cfg:  Storage "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Type "Backup"

I am attempting to output these value fields in the format: FILE,NAME,NAME,POOL,SCHEDULE,STORAGE,TYPE With a column header for each column. If one of the files does not contain one of the grepped for values, I'd like to output an empty record in that space.

The output I want looks like a csv (example below), stripping out any "'s or :'s (NOTE the 3bottom line of the output desired has the Pool field missing, so there are 2 commas to preserve the empty cell):

FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE  
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup  
LVS_60Day_NDMP_Defs.cfg,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup  
LVS_60Day_NDMP_NOFileSet_Defs.cfg,,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup

I've tried multiple things with awk, sed, GNU datamash (transpose), but am not having luck.

Any suggestions?


grep -e Name -e Type -e Schedule -e Pool -e Storage *|awk -F' = '  '{print $1,$2}'|sort|datamash transpose
WindowsDefault_open.cfg:  Name "WindowsDefault_Open"        WindowsDefault_open.cfg:  Pool "BO3_SD01_DD_OPEN1_60Day"        WindowsDefault_open.cfg:  Schedule "WeeklyCycle"    
WindowsDefault_open.cfg:  Storage "BO3_SD01_DD990_OPEN1"        WindowsDefault_open.cfg:  Type "Backup"     
Windows_MBS_SD01_Default.cfg:  Name "Windows_MBS_SD01_Default"  Windows_MBS_SD01_Default.cfg:  Pool "BO3_SD03_DD_V164_OPEN1_60day"  Windows_MBS_SD01_Default.cfg:  Schedule "MonthlyCycle"  Windows_MBS_SD01_Default.cfg:  Storage "BO3_SD03_DD990_OPEN1_V164"  Windows_MBS_SD01_Default.cfg:  Type "Backup"

grep -e Name -e Type -e Schedule -e Pool -e Storage *|awk -F' = '  '{print $1,$2}'|sort|awk '{print $1,$2,$3,$4,$5,$6};'|datamash transpose
1: Name "BO3_Isi_gda_spark_60day_NDMP_Defs"     1!: Name "BO3_vg8-2_2_ucqa-ws_60day_NDMP_Defs"          1: Pool "File"     1!: Pool "File"          1: Schedule "BO3_Prod_Schedule"         1!: Schedule "BO3_Prod_Schedule"        1: Storage "BO3_SD01DD990_NDMP1"           1!: Storage "BO3_SD01_DD990_NDMP1"      1: Type "Backup"        1!: Type "Backup"       AM4WS3LinuxDefault.cfg: Name "AM4WS3LinuxDefault"           AM4WS3LinuxDefault.cfg: Pool "AM4_SD01_WasabiS3-cloud"          AM4WS3LinuxDefault.cfg: Schedule "MonthlyCycle"     AM4WS3LinuxDefault.cfg: Storage "AM4_SD01_WasabiS3-cloud"       AM4WS3LinuxDefault.cfg: Type "Backup"       AM4WS3WindowsDefault.cfg: Name "AM4WS3WindowsDefault"           AM4WS3WindowsDefault.cfg: Pool "AM4SD01_WasabiS3-cloud"    AM4WS3WindowsDefault.cfg: Schedule "MonthlyCycle"       AM4WS3WindowsDefault.cfg: Storage "AM4_SD01_WasabiS3-cloud"         AM4WS3WindowsDefault.cfg: Type "Backup"         backups_BO3LinuxDefault.cfg: Name "backups_BO3LinuxDefault"         backups_BO3LinuxDefault.cfg: Pool "BO3_SD01_2MO"        backups_BO3LinuxDefault.cfg: Schedule "MonthlyCycle"

CodePudding user response:

Once awk becomes part of the solution there's typically no need for grep.

Reverse engineering OP's grep|awk|sort output into some sample files:

$ head *.cfg
==> IRVLinuxDefault.cfg <==
  Name = "IRVLinuxDefault"
  Pool = "IRV_DD890_Full60"
  Schedule = "IRV_Backups"
  Storage = "IRV_SD_DD890"
  Type = "Backup"

==> LVS_60Day_NDMP_Defs.cfg <==
  Name = "LVS_60Day_NDMP_Defs"
  Pool = "LVS_DD_AV_NDMP"
  Schedule = "LVS_NDMP_Monthly"
  Storage = "LVS_SD_DD990_AV_NDMP"
  Type = "Backup"

==> LVS_60Day_NDMP_NOFileSet_Defs.cfg <==                   # NOTE: missing an entry for "Pool"
  Name = "LVS_60Day_NDMP_NOFileSet_Defs"
  Schedule = "LVS_NDMP_Monthly"
  Storage = "LVS_SD_DD990_AV_NDMP"
  Type = "Backup"

==> LVS_Datalake2_Defs.cfg <==
  Name = "LVS_Datalake2_Defs"
  Pool = "LVS_WAS_SD101_13Mo-cloud"
  Schedule = "WeeklyCycle"
  Storage = "LVS_WAS_SD101_13Mo-cloud"
  Type = "Backup"

One awk idea:

awk '

function print_line(  ) {
    if (fname)
        print fname,record["name"],record["pool"],record["schedule"],record["storage"],record["type"]

    delete record                                                   # clear previous line contents
}

BEGIN         { OFS=","
                print "FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE"        # print header

                n=split("name,pool,schedule,storage,type",a,",")    # build array of field names
                for (i=1;i<=n;i  )                                  # convert field names to ...
                    fields[a[i]]                                    # associative array indices
              }

FNR==1        { print_line()                                        # print previous file contents
                fname=FILENAME
              }

              { split($0,a,"\"")                                    # split line on double quotes
                key=tolower($1)                                     # need lowercase field name to match fields[] array indices
              }

key in fields { record[key]=a[2] }                                  # if 1st field is an index in fields[] array then save the 2nd double-quote delimited field

END           { print_line()     }                                  # flush last file contents to stdout
' *cfg > all.csv

This generates:

$ cat all.csv
FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup
LVS_60Day_NDMP_Defs.cfg,LVS_60Day_NDMP_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_60Day_NDMP_NOFileSet_Defs.cfg,LVS_60Day_NDMP_NOFileSet_Defs,,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_Datalake2_Defs.cfg,LVS_Datalake2_Defs,LVS_WAS_SD101_13Mo-cloud,WeeklyCycle,LVS_WAS_SD101_13Mo-cloud,Backup

CodePudding user response:

For cases where the expected value is not provided, this script will allow you to specify the string to be substituted.

It also adapts to circumstances and allows you to specify the delimiter character (for input) to extract the desired variable value.

NOTE: you can't use the single-/double-quote as a delimiter for the split function due to conflicts with awk syntax, hence my use of a sed between your provided input and the script transforming that into the desired output.

#!/bin/bash

### Original command
#grep -e Name -e Type -e Schedule -e Pool -e Storage \*|awk -F' = '  '{print $1,$2}'|sort

sample="grepOutput.txt"

cat >"${sample}" <<"EnDoFiNpUt"
IRVLinuxDefault.cfg:  Name "IRVLinuxDefault"
IRVLinuxDefault.cfg:  Pool "IRV_DD890_Full60"
IRVLinuxDefault.cfg:  Schedule "IRV_Backups"
IRVLinuxDefault.cfg:  Storage "IRV_SD_DD890"
IRVLinuxDefault.cfg:  Type "Backup"
LVS_60Day_NDMP_Defs.cfg:  Name "LVS_60Day_NDMP_Defs"
LVS_60Day_NDMP_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Type "Backup"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Name "LVS_60Day_NDMP_NOFileSet_Defs"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Type "Backup"
LVS_Datalake2_Defs.cfg:  Name "LVS_Datalake2_Defs"
LVS_Datalake2_Defs.cfg:  Pool "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Schedule "WeeklyCycle"
LVS_Datalake2_Defs.cfg:  Storage "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Type "Backup"
EnDoFiNpUt

### cat emulates original grep command output
cat "${sample}" | sed 's \" \| g' |
awk -v delim='|' -v defval="" 'BEGIN{
    printf("FILENAME,NAME,POOL,SCHEDULE,STORAGE,TYPE") ;
    lastFN="" ;
}
{
    pos=index($0,":") ;
    if( pos > 0 ){
        FN=substr($0, 1, pos-1) ;
        split($0, vals, delim );

        if( FN != lastFN ){
            printf("\n%s", FN) ;
            lastFN=FN ;
        } ;
        if( vals[2] == "" ){
            printf(",%s", defval ) ;
        }else{
            printf(",%s", vals[2] ) ;
        } ;
    } ;
}
END{
    print "" ;
}'
  • Related