I have a table that I want to filter by awk. This is an example of how I want to do it:
Berlin BG AD=14;CD=0.05
Cairo CE AD=9;CD=0.01
Toronto TC AD=23;CD=0.17
Sydney SA AD=2;CD=0.11
Tokyo TJ AD=19;CD=0.22
I want to filter the fields based on the AD value and output all fields if in that line AD is equal to or greater than 10.
The result should be like that:
Berlin BG AD=14;CD=0.05
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22
I tried this script:
awk '{if (awk '{print $3}' temp.txt | awk -F";" '{print $1}' | awk -F"=" '{print $2}' >= 10) print $0}' temp.txt
But it gave me such a syntax error about an unexpected newline or end of the string
CodePudding user response:
With your shown samples and in GNU awk
you could try following awk
code. Using match
function of awk
where mentioning regex [[:space:]] (AD=)([0-9] )
which creates 2 capturing group and stores matched values into array named arr. Then checking condition if 2nd value is greater than 10 then print that line.
awk '
match($0,/[[:space:]] (AD=)([0-9] )/,arr) && arr[2]>10
' Input_file
CodePudding user response:
Using awk
you can use a pattern to match a space, then AD=
and 1 or more digits. The number value starts after the first 4 characters of the match and you can compare that with greater than 10:
awk '
match($0,/[[:space:]]AD=[0-9] /) {
if (substr($0, RSTART 4, RLENGTH-4) 0 > 10) print
}
' file
Input
Berlin BG AD=14;CD=0.05
Cairo CE AD=9;CD=0.01
Toronto TC AD=23;CD=0.17
Sydney SA AD=2;CD=0.11
Tokyo TJ AD=19;CD=0.22
Output
Berlin BG AD=14;CD=0.05
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22
If the input has pipes |
you could make it the field separator surrounded with optional whitespace chars, and print |
as the field separator for the output:
awk '
BEGIN{ FS="[[:space:]]*[|][[:space:]]*" }
{
s=""
for (i=1; i<=NF; i ) {
if (match($i,/[[:space:]]AD=[0-9] /) && substr($i, RSTART 4, RLENGTH-4) 0 > 10) {
s=s (s == "" ? $i : " | " $i)
}
}
}
END { print s }
' file
Input
| Berlin BG AD=14;CD=0.05 | Cairo CE AD=9;CD=0.01 | Toronto TC AD=23;CD=0.17 | Sydney SA AD=2;CD=0.11 | Tokyo TJ AD=19;CD=0.22
Output
Berlin BG AD=14;CD=0.05 | Toronto TC AD=23;CD=0.17 | Tokyo TJ AD=19;CD=0.22
CodePudding user response:
I'm confused about the format of the input:
- OP mentions a 'table' but ...
- provided sample shows one long line of city groups separated by spaces and pipes but ...
- OP's code attempt has no reference to pipes (so, are there no pipes in the data file?)
For this answer I'm going to assume the following format:
$ cat temp.txt
Berlin BG AD=14;CD=0.05
Cairo CE AD=9;CD=0.01
Toronto TC AD=23;CD=0.17
Sydney SA AD=2;CD=0.11
Tokyo TJ AD=19;CD=0.22
Setting aside the syntax issues with OP's current code, there's no need for 3x separate awk
scripts (ie, we should be able to generate the desired result with a single awk
script).
One awk
idea:
##########
# assuming the "AD=" entry is always the first item in the 3rd field (as displayed in the sample input):
awk '
{ n=split($3,a,"[;=]") # split 3rd field on dual delimiters ";" and "="; store results in array a[]
if (a[1] == "AD" && a[2] >= 10) # if 1st array entry == "AD" and 2nd array entry >= 10 then ...
print # print current line
}
' temp.txt
##########
# assuming the "AD=" entry could occur anywhere in the 3rd field:
awk '
{ n=split($3,a,"[;=]") # split 3rd field on dual delimiters ";" and "="; store results in array a[]
for (i=1;i<n;i =2) # loop through odd-numbered indices
if (a[i] == "AD" && a[i 1] >= 10) # if current array entry =="AD" and next array entry >= 10 then ...
print # print current line
}
' temp.txt
These both generate:
Berlin BG AD=14;CD=0.05
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22
Modifying to allow dynamic assignment of the 'attribute/threshold' pair:
awk -v attrib="AD" -v thresh="10" '
{ n=split($3,a,"[;=]")
for (i=1;i<n;i =2)
if (a[i] == attrib && a[i 1] >= thresh)
print
}
' temp.txt
For -v attrib="AD" -v thresh="10"
this generates:
Berlin BG AD=14;CD=0.05
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22
For -v attrib="CD" -v thresh=".13"
this generates:
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22
CodePudding user response:
Using any awk:
$ awk -F'[;=]' '$2 >= 10' file
Berlin BG AD=14;CD=0.05
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22
CodePudding user response:
don't make it more complicated than what's needed :
mawk -F= '10 < $2'
Berlin BG AD=14;CD=0.05
Toronto TC AD=23;CD=0.17
Tokyo TJ AD=19;CD=0.22