linux extract only a string starts with a special string and ends with the first occurrence of comma-CodePudding

I have a log file contains some information like below

"variable1=XXX, emotionType=sad, sentimentType=negative..."

What I want is to grep only the matched string, the string starts with emotionType and ends with the first occurrence of comma. E.g.

emotionType=sad
emotionType=joy
...

What I have tried is

grep -e "/^emotionType.*,/" file.log -o

but I got nothing. Anyone can tell me what should I do?

CodePudding user response：

You need to use

grep -o "emotionType[^,]*" file.log

Note:

Remove ^ or replace with \<, starting word boundary construct if your matches are not located at the beginning of each line
Remove the / chars on both ends of the regex since grep does not use regex delimiters (like sed)
[^,] is a negated bracket expression that matches any char other than a comma
* is a POSIX BRE quantifier that matches zero or more occurrences.

See an online demo:

#!/bin/bash
s="variable1=XXX, emotionType=sad, sentimentType=negative, emotionType=happy"
grep -o "emotionType=[^,]*" <<< "$s"

Output:

emotionType=sad
emotionType=happy

CodePudding user response：

1st solution: With awk you could try following program. Simple explanation would be using awk's match function capability and using regex to match string emotionType till next occurrence of , and printing all the matches in awk program.

var="variable1=XXX, emotionType=sad, sentimentType=negative, emotionType=happy"

Where var is a shell variable.

echo "$var" | 
awk '{while(match($0,/emotionType=[^,]*/)){print substr($0,RSTART,RLENGTH);$0=substr($0,RSTART RLENGTH)}}'

2nd solution: Or in GNU awk using RS variable try following awk program.

echo "$var" | awk -v RS='emotionType=[^,]*' 'RT{sub(/\n $/,"",RT);print RT}'