I have a .csv where each row corresponds to a person (first column) and attributes with values that are available for that person. I want to extract the names and values a particular attribute for persons where the attribute is available. The doc is structured as follows:

name,attribute1,value1,attribute2,value2,attribute3,value3
joe,height,5.2,weight,178,hair,
james,,,,,,
jesse,weight,165,height,5.3,hair,brown
jerome,hair,black,breakfast,donuts,height,6.8

I want a file that looks like this:

name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8

Using this earlier post, I've tried a few different awk methods but am still having trouble getting both the first column and then whatever column has the desired value for the attribute (say height). For example the following returns everything.

awk -F "height," '{print $1 "," FS$2}' file.csv

I could grep only the rows with height in them, but I'd prefer to do everything in a single line if I can.

CodePudding user response：

You may use this awk:

cat attrib.awk

BEGIN {
   FS=OFS=","
   print "name,attribute,value"
}
NR > 1 && match($0, k "[^,] ") {
   print $1, substr($0, RSTART 1, RLENGTH-1)
}

# then run it as
awk -v k=',height,' -f attrib.awk file

name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8

# or this one
awk -v k=',weight,' -f attrib.awk file

name,attribute,value
joe,weight,178
jesse,weight,165

CodePudding user response：

I'd suggest a sed one-liner:

sed -n 's/^\([^,]*\).*\(,height,[^,]*\).*/\1\2/p' file.csv

CodePudding user response：

One awk idea:

awk -v attr="height" '
BEGIN  { FS=OFS="," }
FNR==1 { print "name", "attribute", "value"; next }
       { for (i=2;i<=NF;i =2)                         # loop through even-numbered fields
             if ($i == attr) {                        # if field value is an exact match to the "attr" variable then ...
                print $1,$i,$(i 1)                    # print current name, current field and next field to stdout
                next                                  # no need to check rest of current line; skip to next input line
             }
       }
' file.csv

NOTE: this assumes the input value (height in this example) will match exactly (including same capitalization) with a field in the file

This generates:

name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8

CodePudding user response：

With a perl one-liner:

$ perl -lne '
    print "name,attribute,value" if $.==1;
    print "$1,$2" if /^(\w ).*(height,\d \.\d )/
' file

output

name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8