Home > Software engineering >  Using awk and/or grep for two columns from one file1 and grep column 2 value from file2 while insert
Using awk and/or grep for two columns from one file1 and grep column 2 value from file2 while insert

Time:09-28

Good day.

I have two files, vmList and flavorList, the vmList containing the following:

$ cat /tmp/vmList
cf0012vm001| OS-SRV-USG:terminated_at                                 | -
cf0012vm001| accessIPv4                                               |
cf0012vm001| accessIPv6                                               |
cf0012vm001| cf0012v_internal_network network                  | 192.168.210.10
cf0012vm001| created                                                  | 2021-09-17T17:21:39Z
cf0012vm001| flavor                                                   | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm001| hostId                                                   | fcf39100bcc6ae57a8212f97d3251ac43913719f2aebcaa72006956e
cf0012vm001| key_name                                                 | -

cf0012vm002| OS-SRV-USG:terminated_at                                 | -
cf0012vm002| accessIPv4                                               |
cf0012vm002| accessIPv6                                               |
cf0012vm002| cf0012v_internal_network network                  | 192.168.210.11
cf0012vm002| created                                                  | 2021-09-17T17:21:37Z
cf0012vm002| flavor                                                   | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm002| hostId                                                   | e1590af8ddd57f1e2e74617d6c3631195e410bdd188a0b59813ffbef
cf0012vm002| id                                                       | 0e292900-6b50-4055-9842-d95e54fa1490

and the flavorList containing the following information:

$ cat /tmp/flavorList
 -------------------------------------- ------------------ ----------- ------ ----------- ------- ------- ------------- ----------- 
| ID                                   | Name             | Memory_MB | Disk | Ephemeral | Swap  | VCPUs | RXTX_Factor | Is_Public |
 -------------------------------------- ------------------ ----------- ------ ----------- ------- ------- ------------- ----------- 
| 711f0ff2f01d403689819b6cbab36e42     | nd.c4r8d21s8e21  | 8192      | 21   | 21        | 8192  | 4     |             | N/A       |
| 78a70b62efae4fbcb35994aeb0f87678     | nd.c8r16d31s8e31 | 16384     | 31   | 31        | 8192  | 8     |             | N/A       |
| 78f4fe71cc3340a59c62fc0b32d81e3f     | nd.c4r16d100     | 16384     | 100  | 0         |       | 4     |             | N/A       |
| 7a7e6ae4bfe34ac4ab3983b8f764a8ce     | nd.c2r8d40       | 8192      | 40   | 0         |       | 2     |             | N/A       |
| 832169fed2244bb6b1739ab3db0f232e     | nd.c1r4d100      | 4096      | 100  | 0         |       | 1     |             | N/A       |
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
| 8e968623e5c44674b33e1cc1f892e32d     | nd.c9r40d50      | 40960     | 50   | 0         |       | 9     |             | N/A       |
| 8e96a7044566406f9ef7eba48c2a8c55     | nd.c5r4d81       | 4096      | 81   | 0         |       | 5     |             | N/A       |
| 8fd07e2004f84658a76af1cd8b9cea43     | nd.c2r8d50       | 8192      | 50   | 0         |       | 2     |             | N/A       |
 -------------------------------------- ------------------ ----------- ------ ----------- ------- ------- ------------- ----------- 

My goal is to find the 'flavor' in the vmList, then grep the flavor value (nd.c8r16d50e60) from the flavorList, which in itself works:

$ for f in `grep flavor /tmp/vmList|awk '{print $4}'`;do grep ${f} /tmp/flavorList;done
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

However, I would like to add the first parameter from the vmList (cf0012vm001 and cf0012vm002) to precede the output, either in a line above the output or in front of the line:

cf0012vm001 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

or even:

cf0012vm001
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

cf0012vm002
| 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

Please advise.

Bjoern

CodePudding user response:

Would you please try the following:

awk -F '[[:blank:]]*\\|[[:blank:]]*' '
NR==FNR && $2=="flavor" {sub(/[[:blank:]]. /, "", $3); a[$1]=$3; next}
{
    for (i in a) {
       if (a[i] == $3) print i " " $0
    }
}
' vmList flavorList

Output:

cf0012vm001 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
  • The field separator [[:blank:]]*\\|[[:blank:]]* splits the record on the pipe character with preceding / following blank characters if any.
  • The condition NR==FNR && $2=="flavor" matches the flavor line in vmList.
  • The statement sub(/[[:blank:]]. /, "", $3) extracts the nd.xxx field by removing the substring after the blank character.
  • a[$1]=$3 stores the nd.xxx field keyed by the 1st cfxxx field.
  • The final for (i in a) loop prints the matched lines in flavorList with prepending the cfxxx field.

CodePudding user response:

Assumptions:

  • a flavor does not contain spaces
  • a specific ordering of the output has not be stated
  • vmList: column/field #1 could be associated with different flavors [NOTE: not supported by sample data set; OP would need to refute/confirm]

One GNU awk idea that uses an array of arrays:

awk -F'|' '                                  # input field delimiter = "|" for both files
FNR==NR {                                    # for 1st file ...
          name=gensub(/ /,"","g",$2)         # remove all spaces from field #2 and save in awk variable "name"
          if (name == "flavor") {            # if field #2 == "flavor" ...
             split($3,arr,"(")               # split field #3 using "(" as delimiter, storing results in array arr[]
             gsub(" ","",arr[1])             # remove all spaces from first array entry
             flavors[arr[1]]                 # keep track of unique flavors
             col1[arr[1]][$1]                # keep track of associated values from column/field #1
          }
          next
        }
FNR>3   {                                    # for 2nd file, after ignoring first 3 lines ...
          if (NF == 1) next                  # skip line if it only has 1 "|" delimited field
          name=gensub(/ /,"","g",$3)         # remove all spaces from field #3 and save in awk variable "name"
          if (name in flavors)               # if name is in our list of flavors ...
             for (i in col1[name])           # loop through list of columns (from 1st file)
                 print i,$0                  # print column (from 1st file) plus current line
        }
' vmList flavorList

This generates:

cf0012vm001 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1     | nd.c8r16d50e60   | 16384     | 50   | 60        |       | 8     |             | N/A       |

NOTE: while this output appears to be sorted by the first column this is merely a coincidence; if a specific order needs to be guaranteed this can likely be done by adding an appropriate PROCINFO["sorted_in"] entry; OP just needs to state the desired ordering

  • Related