Home > database >  Treat 2nd column with space as one column
Treat 2nd column with space as one column

Time:02-15

I have a command that has an output of the sample below.

SAN_FR_9644T "Threat for Security" /vol/SAN44_39/SAN_FR_9644T-PPIE/Threat for Security
SAN_FR_3131T ZZ$        /vol/SAN44_39/SAN_FR_3131T-PPEL
SAN_FR_2281T "Control Line" /vol/SAN44_39/SAN_FR_2281T-PPF/33YShared/Control Line
SAN_FR_0021T "TT  FPI Station and Source" /vol/SAN145_22/SAN_FR_0021T-PPCR/TT  FPI Station and Source
SAN_FR_3131T DEFF_DEV /vol/SAN22_57/SAN_FR_3131T-PPAG/DEFF_DEV
SAN_FR_2241D BIX_E$    /vol/SAN99_45/SAN_FR_2241D-PPA/E
SAN_FR_2241D NULL_F$    /vol/SAN99_45/SAN_FR_2241D-PPA/F
SAN_FR_2241D TRIP   /vol/SAN99_45/SAN_FR_2241D-PPA/I
SAN_FR_2241D FINANCE   /vol/SAN99_45/SAN_FR_2241D-PPA/G

As you can see, there are rows on the 2nd column that has a space in it but there is a double quote (").

tried running this command but it only works for rows without spaces.

command | awk '{print $2}'

output:

"Threat
ZZ$    
"Control
"TT  
DEFF_DE
BIX_E$ 
NULL_F$
TRIP   
FINANCE

what is wanted to get is the complete list of the 2nd column even if there's a space on it.

Threat for Security
ZZ$
Control Line
TT  FPI Station and Source
DEFF_DEV
BIX_E$
NULL_F$
TRIP
FINANCE

would appreciate any help.

CodePudding user response:

1st solution: With your shown samples, with any awk, please try following. Simple explanation would be, using match function matching regex "[^"]*" to match from 1st occurrence of " to next occurrence of " and printing matched sub-string and next will skip all further statements. In case this condition is NOT TRUE then anyways usual way of printing 2nd field will work so printing $2 then.

awk 'match($0,/"[^"]*"/){print substr($0,RSTART 1,RLENGTH-2);next} {print $2}' Input_file


2nd solution: With GNU awk, please try following awk code.

awk -v FPAT='[^ ]*|"[^"] "' '{gsub(/^"|"$/,"",$2);print $2}' Input_file

CodePudding user response:

With sed you can look at the lines without a quote first, and next the lines with quotes.
That would give the ugly command:

sed -r '/^[^"]*$/s/[^ ]* ([^ ]*).*/\1/; s/[^"]*"([^"]*).*/\1/' inputfile

The two parts can be combined, but in this attempt you need to remove quotes at the end:

sed -r 's/[^ ]* (["][^"]*|[^ ]*).*/\1/; s/"//' inputfile

This can be done smarter (without the fix s/"// for removing the ")

sed -r 's/[^ ]* ("([^"]*)|([^ ]*)).*/\2\3/' inputfile

Explanation of the last sed:

  • 's/[^ ]*
    Remove first word (everything without a space and the next space).
  • ((...)|(...)).*/\2\3/
    Look for 2 different matches (OR), which result will end up in \2 OR \3.
    Using \2\3 will show the matched string.
    .* will remove the rest of the line.
  • "([^"]*)
    A field starting with a ", but keep the quote outside the remembered string.
    The string will continue until the next quote.
  • ([^ ]*)
    String without spaces.
  • Related