Home > front end >  How to display file columns containing a specific word using awk
How to display file columns containing a specific word using awk

Time:12-05

I would like to print all columns that contains word, for example "watermelon". A was thinking about using together these 2 formulas, because they are working separetly (one is doing something for every column in file and another is checking if column contains specyfic word).

awk '{for(i=1;i<=NF-1;i  ) printf $i" "; print $i}' a.csv
awk -F"," '{if ($2 == " watermelon") print $2}' a.csv

But when I try put them toghether my code isn't working

#!/bin/bash 
awk '{for(i=1;i<=NF-1;i  ) 
         awk -F"," '{if ($i == " watermelon") 
              print $i}' a.csv    
        }' a.csv

For example this is my file a.csv

lp, type, name, number, letter
1, fruit, watermelon, 6, a
2, fruit, apple, 7, b
3, vegetable, onion, 8, c
4, vegetable, broccoli, 6, b
5, fruit, orange, 5, c

And this is the result i would like to get, while searching for word watermelon

name
watermelon
apple
onion
broccoli
orange

CodePudding user response:

$ cat tst.awk
BEGIN { FS=OFS=", " }
NR==FNR {
    for (inFldNr=1; inFldNr<=NF; inFldNr  ) {
        if ( $inFldNr == tgt ) {
            hits[inFldNr]
        }
    }
    next
}
FNR==1 {
    for (inFldNr=1; inFldNr<=NF; inFldNr  ) {
        if ( inFldNr in hits ) {
            out2in[  numOutFlds] = inFldNr
        }
    }
}
{
    for (outFldNr=1; outFldNr<=numOutFlds; outFldNr  ) {
        inFldNr = out2in[outFldNr]
        printf "%s%s", $inFldNr, (outFldNr<numOutFlds ? OFS : ORS)
    }
}

$ awk -v tgt='watermelon' -f tst.awk file file
name
watermelon
apple
onion
broccoli
orange

The main difference between the above and @JamesBrown's approach is that in the 2nd pass of the file my script only loops over the fields to be output while James' loops over all input fields and so will be slower in what is presumably the normal case where not all input fields have to be output.

Regarding printf $i in your code btw - never do that, always do printf "%s", $i for any input data instead as the former will fail when your input contains printf formatting chars like %s.

CodePudding user response:

Here's one that processes the data twice:

$ awk -F', ' '                          # remember to se OFS if you need one
NR==FNR {                               # on the first run
    for(i=1;i<=NF;i  )                  # find 
        if($i=="watermelon")            # watermelon fields
            a[i]                        # and mark them
    next
}
FNR==1 {                                # in case there were no such field
    for(i in a)                         # test 
        next                            # and continue
    exit                                # or exit
}
{                                       # on the second run
    for(i=1;i<=NF;i  )                 
        if(i in a)b=b (b==""?"":OFS) $i # buffer those fields for output
    print b                             # and output
    b=""                                # clean that buffer for next record
}' file file

Output:

name
watermelon
apple
onion
broccoli
orange
  • Related