Using awk and/or grep for two columns from one file1 and grep column 2 value from file2 while insert-CodePudding

Good day.

I have two files, vmList and flavorList, the vmList containing the following:

$ cat /tmp/vmList
cf0012vm001| OS-SRV-USG:terminated_at                                 | -
cf0012vm001| accessIPv4                                               |
cf0012vm001| accessIPv6                                               |
cf0012vm001| cf0012v_internal_network network                  | 192.168.210.10
cf0012vm001| created                                                  | 2021-09-17T17:21:39Z
cf0012vm001| flavor                                                   | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm001| hostId                                                   | fcf39100bcc6ae57a8212f97d3251ac43913719f2aebcaa72006956e
cf0012vm001| key_name                                                 | -

cf0012vm002| OS-SRV-USG:terminated_at                                 | -
cf0012vm002| accessIPv4                                               |
cf0012vm002| accessIPv6                                               |
cf0012vm002| cf0012v_internal_network network                  | 192.168.210.11
cf0012vm002| created                                                  | 2021-09-17T17:21:37Z
cf0012vm002| flavor                                                   | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm002| hostId                                                   | e1590af8ddd57f1e2e74617d6c3631195e410bdd188a0b59813ffbef
cf0012vm002| id                                                       | 0e292900-6b50-4055-9842-d95e54fa1490

and the flavorList containing the following information:

$ cat /tmp/flavorList
 -------------------------------------- ------------------ ----------- ------ ----------- ------- ------- ------------- ----------- 
| ID                                   | Name             | Memory_MB | Disk | Ephemeral | Swap  | VCPUs | RXTX_Factor | Is_Public |
 -------------------------------------- ------------------ ----------- ------ ----------- ------- ------- ------------- ----------- 
| 711f0ff2f01d403689819b6cbab36e42     | nd.c4r8d21s8e21  | 8192      | 21   | 21        | 8192  | 4     |             | N/A       |
| 78a70b62efae4fbcb35994aeb0f87678     | nd.c8r16d31s8e31 | 16384     | 31   | 31        | 8192  | 8     |             | N/A       |
| 78f4fe71cc3340a59c62fc0b32d81e3f     | nd.c4r16d100     | 16384     | 100  | 0         |       | 4     |             | N/A       |
| 7a7e6ae4bfe34ac4ab3983b8f764a8ce     | nd.c2r8d40       | 8192      | 40   | 0         |       | 2     |             | N/A       |
| 832169fed2244bb6b1739ab3db0f232e     | nd.c1r4d100      | 4096      | 100  | 0         |       | 1     |             | N/A       |
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
| 8e968623e5c44674b33e1cc1f892e32d     | nd.c9r40d50      | 40960     | 50   | 0         |       | 9     |             | N/A       |
| 8e96a7044566406f9ef7eba48c2a8c55     | nd.c5r4d81       | 4096      | 81   | 0         |       | 5     |             | N/A       |
| 8fd07e2004f84658a76af1cd8b9cea43     | nd.c2r8d50       | 8192      | 50   | 0         |       | 2     |             | N/A       |
 -------------------------------------- ------------------ ----------- ------ ----------- ------- ------- ------------- -----------

My goal is to find the 'flavor' in the vmList, then grep the flavor value (nd.c8r16d50e60) from the flavorList, which in itself works:

$ for f in `grep flavor /tmp/vmList|awk '{print $4}'`;do grep ${f} /tmp/flavorList;done
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

However, I would like to add the first parameter from the vmList (cf0012vm001 and cf0012vm002) to precede the output, either in a line above the output or in front of the line:

cf0012vm001 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

or even:

cf0012vm001
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

cf0012vm002
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

Please advise.

Bjoern

CodePudding user response：

Would you please try the following:

awk -F '[[:blank:]]*\\|[[:blank:]]*' '
NR==FNR && $2=="flavor" {sub(/[[:blank:]]. /, "", $3); a[$1]=$3; next}
{
    for (i in a) {
       if (a[i] == $3) print i " " $0
    }
}
' vmList flavorList

Output:

cf0012vm001 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

The field separator [[:blank:]]*\\|[[:blank:]]* splits the record on the pipe character with preceding / following blank characters if any.
The condition NR==FNR && $2=="flavor" matches the flavor line in vmList.
The statement sub(/[[:blank:]]. /, "", $3) extracts the nd.xxx field by removing the substring after the blank character.
a[$1]=$3 stores the nd.xxx field keyed by the 1st cfxxx field.
The final for (i in a) loop prints the matched lines in flavorList with prepending the cfxxx field.

CodePudding user response：

Assumptions:

a flavor does not contain spaces
a specific ordering of the output has not be stated
vmList: column/field #1 could be associated with different flavors [NOTE: not supported by sample data set; OP would need to refute/confirm]

One GNU awk idea that uses an array of arrays:

awk -F'|' '                                  # input field delimiter = "|" for both files
FNR==NR {                                    # for 1st file ...
          name=gensub(/ /,"","g",$2)         # remove all spaces from field #2 and save in awk variable "name"
          if (name == "flavor") {            # if field #2 == "flavor" ...
             split($3,arr,"(")               # split field #3 using "(" as delimiter, storing results in array arr[]
             gsub(" ","",arr[1])             # remove all spaces from first array entry
             flavors[arr[1]]                 # keep track of unique flavors
             col1[arr[1]][$1]                # keep track of associated values from column/field #1
          }
          next
        }
FNR>3   {                                    # for 2nd file, after ignoring first 3 lines ...
          if (NF == 1) next                  # skip line if it only has 1 "|" delimited field
          name=gensub(/ /,"","g",$3)         # remove all spaces from field #3 and save in awk variable "name"
          if (name in flavors)               # if name is in our list of flavors ...
             for (i in col1[name])           # loop through list of columns (from 1st file)
                 print i,$0                  # print column (from 1st file) plus current line
        }
' vmList flavorList

This generates:

cf0012vm001 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

NOTE: while this output appears to be sorted by the first column this is merely a coincidence; if a specific order needs to be guaranteed this can likely be done by adding an appropriate PROCINFO["sorted_in"] entry; OP just needs to state the desired ordering