Extracting substring with awk if the string includes regular expressions-CodePudding

I have many strings like this

i=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt

I need to extract the M1 code and the FCT but I am unable to do so, likely due to the regular expressions. FCT I can do with echo ${i:30:3}, but for M1 nothing seems to work - my last try was grep -oP '.*\K(?<=.\/)\w (?=\/Cus)' $i ;

The length of the string can vary (but it always starts with /F) and /M1/ is always in the same position

Hope somebody can help. Thanks!

CodePudding user response：

You could try following awk programs.

To get FCT likewise strings try: Since position of string is NOT fixed as well as only /F is fixed, so I am trying to match /F till next occurrence of / so it will catch any value after /F but before next occurrence of / here.

echo "$i" | awk 'match($0,/\/F[^/]*/){print substr($0,RSTART 1,RLENGTH-1)}'

To get M1 try following awk program, since position of M1 is always fixed(as per OP in question), so I am using 2 substitute calls here, where first one is removing starting ./ with NULL and 2nd substitute call is removing everything from / to till last of line with NULL and then printing the line which will give M1 part.

echo "$i" | awk '{sub(/^\.\//,"");sub(/\/.*/,"")} 1'

CodePudding user response：

Bash allows you to split a string into an array.

# starting value
str=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt

# split string on / delimiter into the split array
IFS=/ read -ra split <<<"$str"

# get M1 and FCT elements at their respective indexes
M1=${split[1]}
FCT=${split[5]}

# dump M1 and FCT variables for demo purpose
declare -p M1 FCT

CodePudding user response：

Another option with awk is split() to split the path components into an array. The array a[] is filled by the command below and the 2nd and 6th elements ("M1", and "FCT")

awk '{split($1,a,"/"); print a[2]", "a[6]}'

Example Use/Output

$ i=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt; echo "$i" | 
awk '{split($1,a,"/"); print a[2]", "a[6]}'
M1, FCT

CodePudding user response：

If the positions of the strings are always after the same number of forward slashes, you can print the 2nd and the 6th field, setting the field separator to /

echo "$i" | awk -F"/" '{print $2, $6}'

Output

M1 FCT

You might also use gnu awk and a pattern with 2 capture groups matching the following Cus for the first match, and starting with F for the second match.

The negated character class [^\/]* matches 0 or more characters except a /

echo "$i" | awk 'match($0, /[^\/]*\/([^\/]*)\/Cus.*\/(F[^\/]*)/, a) {print a[1], a[2]}'

CodePudding user response：

You have your awk answers, but I felt like contributing a bash idea just for fun.

[[ "$i" =~ ^\./([[:alnum:]] )(/[[:alnum:]] ){3}/([[:alnum:]] )/.* ]] \
    && echo "${BASH_REMATCH[1]} ${BASH_REMATCH[3]}"

BASH_REMATCH array matches the capture groups in the test case. Index 0 is the complete string.

A slightly shorter version yielding the same output:

[[ "$i" =~ ^\./([[:alnum:]] )(/[[:alnum:]] ){4}/.* ]] \
    && echo "${BASH_REMATCH[1]} ${BASH_REMATCH[2]:1}"