How to extract substring in the double quotes by using awk or other methods?-CodePudding

I want to extract you in the sample string:

See [ "you" later

However, my result is wrong:

 awk '{ sub(/.*\"/, ""); sub(/\".*/, ""); print }' <<< "See [ \"you\" later"

result:

 later

In awk or other methods, how can I extract the substring in the double quotes?

CodePudding user response：

Here is an awk solution without any regex:

s='See [ "you" later'
awk -F '"' 'NF>2 {print $2}' <<< "$s"

you

Or a sed solution with regex:

sed -E 's/[^"]*"([^"]*)".*/\1/' <<< "$s"
you

Another awk with match:

awk 'match($0, /"[^"]*"/) {print substr($0, RSTART 1, RLENGTH-2)}' <<< "$s"

you

CodePudding user response：

1st solution: You can make use of gsub function of awk here. Just simply do 2 substitutions with NULL. 1st till 1st occurrence of " and then substitute everything from next " occurrence to everything with NULL and print that line.

awk '{gsub(/^[^"]*"|".*/,"")} 1' Input_file

2nd solution: Using GNU grep solution. Using its -oP option to print matched part and enable PCRE regex option respectively. With regex from starting match till very first occurrence of " and using \K option to forget matched part and then again match everything just before next occurrence of " which will print text between 2 " as per requirement.

grep -oP '^.*?"\K[^"]*' Input_file

CodePudding user response：

You can also use cut here:

cut -d\" -f 2 <<< 'See [ "you" later '

It splits the string with a double quote and gets the second item.

Output:

you

See the online demo.

CodePudding user response：

Using bash

IFS='"'
read -ra arr <<< "See [ \"you\" later"
echo ${arr[1]}

gives output

you

Explanation: use IFS to inform bash to split at ", read splitted text into array arr print 2nd element (which is [1] as [0] denotes 1st element).

CodePudding user response：

Extract all quoted substrings, and remove the quotes:

echo 'See [ "you" later, "" "a" "b" "c' |
grep -o '"[^"]*"' | tr -d \"

Gives:

you

a
b

"" is matched as an empty string on the second line of output (use grep -o '"[^"]\ "' to skip empty strings)
"c is not fully quoted, so it doesn't match

For a small string, you may want to use pure shell. This extracts the first quoted substring in $str:

str='Example "a" and "b".'
str=${str#*\"} # Cut up to first quote
case $str in
    *\"*) str=${str%%\"*};; # Cut from second quote onwards
    *) str= # $str contains less than two quotes
esac
echo "$str"

Gives

CodePudding user response：

Just a few ways using GNU awk for:

multi-char RS and RT:

$ echo 'See [ "you" later' |
    awk -v RS='"[^"]*"' 'RT{ print substr(RT,2,length(RT)-2) }'
you

the 3rd arg to match():

$ echo 'See [ "you" later' |
    awk 'match($0,/"([^"]*)"/,a){ print a[1] }'
you

gensub() (assuming the quoted string is always present):

$ echo 'See [ "you" later' |
    awk '{print gensub(/.*"([^"]*)".*/,"\\1",1)}'
you

FPAT:

$ echo 'See [ "you" later' |
    awk -v FPAT='[^"]*' 'NF>2{print $2}'
you

$ echo 'See [ "you" later' |
    awk -v FPAT='"[^"]*"' 'NF{print substr($1,2,length($1)-2)}'
you

patsplit():

$ echo 'See [ "you" later' |
    awk 'patsplit($0,f,/"[^"]*"/,s){print substr(f[1],2,length(f[1])-2)}'
you

the 4th arg to split():

$ echo 'See [ "you" later' |
    awk 'split($0,f,/"[^"]*"/,s)>1{print substr(s[1],2,length(s[1])-2)}'
you

CodePudding user response：

Using sed

$ sed -n 's/[^"]*"\([[:alpha:]]\ \)"[^"]*/\1 /gp' input_file
you

CodePudding user response：

$ grep -oP '(?<=").*(?=")' <<< "See [ \"you\" later"
you