grep returns no matches for following expression-CodePudding

Using the (?<=_)(.*)(?=\.) regex with the 23353_test.txt test string returns nothing with grep with the -p option. It doesn’t show errors either. I expect the return to be test. But when the regex is tried in regex101.com it runs correctly.

CodePudding user response：

The following GNU grep command extracts the right substring:

grep -oP '(?<=_).*(?=\.)' file

Note that .* matches greedily, and if you want to make sure you match a substring between the closest _ and . you need to use a

grep -oP '(?<=_)[^._]*(?=\.)' file

where [^._]* matches zero or more chars other than . and _.

If you cannot rely on your grep, you can use sed here:

sed -n 's/.*_\(.*\)\..*/\1/p' file

See the online demo:

#!/bin/bash
s='23353_test.txt'
grep -oP '(?<=_)(.*)(?=\.)' <<< "$s"
# => test
sed -n 's/.*_\(.*\)\..*/\1/p' <<< "$s"
# => test

CodePudding user response：

1st solution: You should use awk for this requirement, please try following as per your shown samples. Setting field separators as _ OR . and checking condition if number of fields are 3 then printing 2nd field here.

s='23353_test.txt'
echo "$s" | awk -F'[_.]' 'NF==3{print $2}'

2nd solution: Using sed program here with using capturing group capability of sed. Using -E option to enable ERE in sed then in main program using regex ^[^_]*_([^.]*)\..*, which matches from starting till 1st occurrence of _ and creating 1st and only capturing group which has everything which comes between _ and . in it and after it matching literal . till end of line. Then while substituting whole line with 1st capturing group value.

s='23353_test.txt'
echo "$s" | sed -E 's/^[^_]*_([^.]*)\..*/\1/'

3rd solution: Using GNU awk using awk's match function here. Using regex inside match function to match betwen 1st occurrence of _ till . comes and having it inside a capturing group, we are using array named arr which will store captured values in it, so printing 1st capturing group value by arr[1] in it.

echo "$s" | awk 'match($0,/^[^_]*_([^.]*)\..*$/,arr){print arr[1]}'

4th solution: Using GNU grep here, where using its -o and -P options. Where -o option is for printing matched part only and -P flag is for enabling PCRE regex. Here is Online demo for following regex.

echo "$s" | grep -oP '^.*?_\K([^.]*)(?=\.\S $)'