How to apply if function to a field inside another field?-CodePudding

I have a table that I want to filter by awk. This is an example of how I want to do it:

Berlin  BG  AD=14;CD=0.05 
Cairo   CE  AD=9;CD=0.01 
Toronto TC  AD=23;CD=0.17
Sydney  SA  AD=2;CD=0.11 
Tokyo   TJ  AD=19;CD=0.22

I want to filter the fields based on the AD value and output all fields if in that line AD is equal to or greater than 10.

The result should be like that:

Berlin  BG  AD=14;CD=0.05
Toronto TC  AD=23;CD=0.17
Tokyo   TJ  AD=19;CD=0.22

I tried this script:

awk '{if (awk '{print $3}' temp.txt | awk -F";" '{print $1}' | awk -F"=" '{print $2}' >= 10) print $0}' temp.txt

But it gave me such a syntax error about an unexpected newline or end of the string

CodePudding user response：

With your shown samples and in GNU awk you could try following awk code. Using match function of awk where mentioning regex [[:space:]] (AD=)([0-9] ) which creates 2 capturing group and stores matched values into array named arr. Then checking condition if 2nd value is greater than 10 then print that line.

awk '
match($0,/[[:space:]] (AD=)([0-9] )/,arr) && arr[2]>10
'  Input_file

CodePudding user response：

Using awk you can use a pattern to match a space, then AD= and 1 or more digits. The number value starts after the first 4 characters of the match and you can compare that with greater than 10:

awk '
match($0,/[[:space:]]AD=[0-9] /) {
  if (substr($0, RSTART  4, RLENGTH-4) 0 > 10) print
}
' file

Input

Berlin BG AD=14;CD=0.05
Cairo CE AD=9;CD=0.01
Toronto TC AD=23;CD=0.17
Sydney SA AD=2;CD=0.11
Tokyo TJ AD=19;CD=0.22

Output

Berlin BG AD=14;CD=0.05
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22

If the input has pipes | you could make it the field separator surrounded with optional whitespace chars, and print | as the field separator for the output:

awk '
BEGIN{ FS="[[:space:]]*[|][[:space:]]*" }
{
  s=""
  for (i=1; i<=NF; i  ) {
    if (match($i,/[[:space:]]AD=[0-9] /) && substr($i, RSTART 4, RLENGTH-4) 0 > 10) {
      s=s (s == "" ? $i : " | " $i)
    }
  }
}
END { print s }
' file

Input

| Berlin BG AD=14;CD=0.05 | Cairo CE AD=9;CD=0.01 | Toronto TC AD=23;CD=0.17 | Sydney SA AD=2;CD=0.11 | Tokyo TJ AD=19;CD=0.22

Output

Berlin BG AD=14;CD=0.05 | Toronto TC AD=23;CD=0.17 | Tokyo TJ AD=19;CD=0.22

CodePudding user response：

I'm confused about the format of the input:

OP mentions a 'table' but ...
provided sample shows one long line of city groups separated by spaces and pipes but ...
OP's code attempt has no reference to pipes (so, are there no pipes in the data file?)

For this answer I'm going to assume the following format:

$ cat temp.txt
Berlin BG AD=14;CD=0.05
Cairo CE AD=9;CD=0.01
Toronto TC AD=23;CD=0.17
Sydney SA AD=2;CD=0.11
Tokyo TJ AD=19;CD=0.22

Setting aside the syntax issues with OP's current code, there's no need for 3x separate awk scripts (ie, we should be able to generate the desired result with a single awk script).

One awk idea:

##########
# assuming the "AD=" entry is always the first item in the 3rd field (as displayed in the sample input):

awk '
{ n=split($3,a,"[;=]")                      # split 3rd field on dual delimiters ";" and "="; store results in array a[]
  if (a[1] == "AD" && a[2] >= 10)           # if 1st array entry == "AD" and 2nd array entry >= 10 then ...
     print                                  # print current line
}
' temp.txt

##########
# assuming the "AD=" entry could occur anywhere in the 3rd field:

awk '
{ n=split($3,a,"[;=]")                      # split 3rd field on dual delimiters ";" and "="; store results in array a[]
  for (i=1;i<n;i =2)                        # loop through odd-numbered indices
      if (a[i] == "AD" && a[i 1] >= 10)     # if current array entry =="AD" and next array entry >= 10 then ...
         print                              # print current line
}
' temp.txt

These both generate:

Berlin BG AD=14;CD=0.05
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22

Modifying to allow dynamic assignment of the 'attribute/threshold' pair:

awk -v attrib="AD" -v thresh="10" '
{ n=split($3,a,"[;=]")
  for (i=1;i<n;i =2)
      if (a[i] == attrib && a[i 1] >= thresh)
         print
}
' temp.txt

For -v attrib="AD" -v thresh="10" this generates:

Berlin BG AD=14;CD=0.05
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22

For -v attrib="CD" -v thresh=".13" this generates:

Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22

CodePudding user response：

Using any awk:

$ awk -F'[;=]' '$2 >= 10' file
Berlin  BG  AD=14;CD=0.05
Toronto TC  AD=23;CD=0.17
Tokyo   TJ  AD=19;CD=0.22

CodePudding user response：

don't make it more complicated than what's needed :

mawk -F= '10 <  $2'

Berlin  BG  AD=14;CD=0.05 
Toronto TC  AD=23;CD=0.17
Tokyo   TJ  AD=19;CD=0.22