I have a command that has an output of the sample below.
SAN_FR_9644T "Threat for Security" /vol/SAN44_39/SAN_FR_9644T-PPIE/Threat for Security
SAN_FR_3131T ZZ$ /vol/SAN44_39/SAN_FR_3131T-PPEL
SAN_FR_2281T "Control Line" /vol/SAN44_39/SAN_FR_2281T-PPF/33YShared/Control Line
SAN_FR_0021T "TT FPI Station and Source" /vol/SAN145_22/SAN_FR_0021T-PPCR/TT FPI Station and Source
SAN_FR_3131T DEFF_DEV /vol/SAN22_57/SAN_FR_3131T-PPAG/DEFF_DEV
SAN_FR_2241D BIX_E$ /vol/SAN99_45/SAN_FR_2241D-PPA/E
SAN_FR_2241D NULL_F$ /vol/SAN99_45/SAN_FR_2241D-PPA/F
SAN_FR_2241D TRIP /vol/SAN99_45/SAN_FR_2241D-PPA/I
SAN_FR_2241D FINANCE /vol/SAN99_45/SAN_FR_2241D-PPA/G
As you can see, there are rows on the 2nd column that has a space in it but there is a double quote (").
tried running this command but it only works for rows without spaces.
command | awk '{print $2}'
output:
"Threat
ZZ$
"Control
"TT
DEFF_DE
BIX_E$
NULL_F$
TRIP
FINANCE
what is wanted to get is the complete list of the 2nd column even if there's a space on it.
Threat for Security
ZZ$
Control Line
TT FPI Station and Source
DEFF_DEV
BIX_E$
NULL_F$
TRIP
FINANCE
would appreciate any help.
CodePudding user response:
1st solution: With your shown samples, with any awk
, please try following. Simple explanation would be, using match
function matching regex "[^"]*"
to match from 1st occurrence of "
to next occurrence of "
and printing matched sub-string and next
will skip all further statements. In case this condition is NOT TRUE then anyways usual way of printing 2nd field will work so printing $2 then.
awk 'match($0,/"[^"]*"/){print substr($0,RSTART 1,RLENGTH-2);next} {print $2}' Input_file
2nd solution: With GNU awk
, please try following awk
code.
awk -v FPAT='[^ ]*|"[^"] "' '{gsub(/^"|"$/,"",$2);print $2}' Input_file
CodePudding user response:
With sed
you can look at the lines without a quote first, and next the lines with quotes.
That would give the ugly command:
sed -r '/^[^"]*$/s/[^ ]* ([^ ]*).*/\1/; s/[^"]*"([^"]*).*/\1/' inputfile
The two parts can be combined, but in this attempt you need to remove quotes at the end:
sed -r 's/[^ ]* (["][^"]*|[^ ]*).*/\1/; s/"//' inputfile
This can be done smarter (without the fix s/"//
for removing the "
)
sed -r 's/[^ ]* ("([^"]*)|([^ ]*)).*/\2\3/' inputfile
Explanation of the last sed
:
's/[^ ]*
Remove first word (everything without a space and the next space).((...)|(...)).*/\2\3/
Look for 2 different matches (OR), which result will end up in\2
OR\3
.
Using\2\3
will show the matched string.
.*
will remove the rest of the line."([^"]*)
A field starting with a"
, but keep the quote outside the remembered string.
The string will continue until the next quote.([^ ]*)
String without spaces.