Is there an easiest way to search the following data with specific field based on the field @@id:
?
This is the sample data file called sample
@@id: 123 @@name: John Doe @@age: 18 @@Gender: Male
@@id: 345 @@name: Sarah Benson @@age: 20 @@Gender: Female
For example, If I want to search an ID of 123 and his gender I would do this:
Basically this is the prototype that I want:
# search.sh
#!/bin/bash
# usage: search.sh <id> <field>
# eg: search 123 age
search="$1"
field="$2"
grep "^@@id: ${search}" sample | # FILTER <FIELD>
So when I search an ID 123 like below:
search.sh 123 gender
The output would be
Male
Up until now, based on the code above, I only able to grep one line based on ID, and I'm not sure what is the best method or fastest method with less complicated to get its next value after specifying the field (eg. age)
CodePudding user response:
1st solution: With your shown samples, please try following bash script. This considers that you want to match exact string match.
cat script.bash
#!/bin/bash
search="$1"
field="$2"
awk -v search="$search" -v field="$field" '
match($0,"@@id:[[:space:]]*"search){
value=""
match($0,"@@"field":[[:space:]]*[^@] ")
value=substr($0,RSTART,RLENGTH)
sub(/.*: /,"",value)
print value
}
' Input_file
2nd solution: In case you want to search strings(values) irrespective of their cases(lower/upper case) in each line then try following code.
cat script.bash
#!/bin/bash
search="$1"
field="$2"
awk -v search="$search" -v field="$field" '
match(tolower($0),"@@id:[[:space:]]*"tolower(search)){
value=""
match(tolower($0),"@@"tolower(field)":[[:space:]]*[^@] ")
value=substr($0,RSTART,RLENGTH)
sub(/.*: /,"",value)
print value
}
' Input_file
Explanation: Simple explanation of code would be, creating BASH script, which is expecting 2 parameters while its being run. Then passing these parameters as values to awk
program. Then using match
function to match the id in each line and print the value of passed field(eg: name OR Gender etc).
CodePudding user response:
Since you want to extract a part of each line found, different from the part you are matching against, sed
or awk
would be a better tool than grep
. You could pipe the output of grep
into one of the others, but that's wasteful because both sed
and awk
can do the line selection directly. I would do something like this:
#!/bin/bash
search="$1"
field="$2"
sed -n "/^@@id: ${search}"'\>/ { s/.*@@'"${field}"': *//i; s/ *@@.*//; p }' sample
Explanation:
sed
is instructed to read filesample
, which it will do line by line.- The
-n
option tellssed
to suppress its usual behavior of automatically outputting its pattern space at the end of each cycle, which is an easy way to filter out lines that don't match the search criterion. - The
sed
expression starts with an address, which in this case is a pattern matching lines by id, according to the script's first argument. It is much like yourgrep
pattern, but I append\>
, which matches a word boundary. That way, searches for id 123 will not also match id 1234. - The rest of the
sed
expression edits out the everything in the line except the value of the requested field, with the field name being matched case-insensitively, and prints the result. The editing is accomplished by the twos///
commands, and thep
command is of course for "print". These are all enclosed in curly braces ({}
) and separated by semicolons (;
) to form a single compound associated with the given address.
CodePudding user response:
Assumptions:
- 'label' fields have format
@@<string>:
- need to handle case-insensitive searches
- 'label' fields could be located anywhere in the line (ie, there is no set ordering of 'label' fields)
- the 1st input search parameter is always a value associated with the
@@id:
label - the 2nd input search parameter is to be matched as a whole word (ie, no partial label matching;
nam
will not match against@@name:
) - if there are multiple 'label' fields that match the 2nd input search parameter we print the value associated with the 1st match found in the line)
One awk
idea:
awk -v search="${search}" -v field="${field}" '
BEGIN { field = tolower(field) }
{ n=split($0,arr,"@@|:") # split current line on dual delimiters "@@" and ":", place fields into array arr[]
found_search = 0
found_field = 0
for (i=2;i<=n;i=i 2) { # loop through list of label fields
label=tolower(arr[i])
value = arr[i 1]
sub(/^[[:space:]] /,"",value) # strip leading white space
sub(/[[:space:]] $/,"",value) # strip trailing white space
if ( label == "id" && value == search )
found_search = 1
if ( label == field && ! found_field )
found_field = value
}
if ( found_search && found_field )
print found_field
}
' sample
Sample input:
$ cat sample
@@id: 123 @@name: John Doe @@age: 18 @@Gender: Male
@@id: 345 @@name: Sarah Benson @@age: 20 @@Gender: Female
@@name: Archibald P. Granite, III, Ph.D, M.D. @@age: 20 @@Gender: not specified @@id: 567
Test runs:
search=123 field=gender => Male
search=123 field=ID => 123
search=123 field=Age => 18
search=345 field=name => Sarah Benson
search=567 field=name => Archibald P. Granite, III, Ph.D, M.D.
search=567 field=GENDER => not specified
search=999 field=age => <no output>