Home > other >  Extracting substring with awk if the string includes regular expressions
Extracting substring with awk if the string includes regular expressions

Time:02-10

I have many strings like this

i=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt

I need to extract the M1 code and the FCT but I am unable to do so, likely due to the regular expressions. FCT I can do with echo ${i:30:3}, but for M1 nothing seems to work - my last try was grep -oP '.*\K(?<=.\/)\w (?=\/Cus)' $i ;

The length of the string can vary (but it always starts with /F) and /M1/ is always in the same position

Hope somebody can help. Thanks!

CodePudding user response:

You could try following awk programs.

To get FCT likewise strings try: Since position of string is NOT fixed as well as only /F is fixed, so I am trying to match /F till next occurrence of / so it will catch any value after /F but before next occurrence of / here.

echo "$i" | awk 'match($0,/\/F[^/]*/){print substr($0,RSTART 1,RLENGTH-1)}'

To get M1 try following awk program, since position of M1 is always fixed(as per OP in question), so I am using 2 substitute calls here, where first one is removing starting ./ with NULL and 2nd substitute call is removing everything from / to till last of line with NULL and then printing the line which will give M1 part.

echo "$i" | awk '{sub(/^\.\//,"");sub(/\/.*/,"")} 1'

CodePudding user response:

Bash allows you to split a string into an array.

# starting value
str=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt

# split string on / delimiter into the split array
IFS=/ read -ra split <<<"$str"

# get M1 and FCT elements at their respective indexes
M1=${split[1]}
FCT=${split[5]}

# dump M1 and FCT variables for demo purpose
declare -p M1 FCT

CodePudding user response:

Another option with awk is split() to split the path components into an array. The array a[] is filled by the command below and the 2nd and 6th elements ("M1", and "FCT")

awk '{split($1,a,"/"); print a[2]", "a[6]}'

Example Use/Output

$ i=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt; echo "$i" | 
awk '{split($1,a,"/"); print a[2]", "a[6]}'
M1, FCT

CodePudding user response:

If the positions of the strings are always after the same number of forward slashes, you can print the 2nd and the 6th field, setting the field separator to /

echo "$i" | awk -F"/" '{print $2, $6}'

Output

M1 FCT

You might also use gnu awk and a pattern with 2 capture groups matching the following Cus for the first match, and starting with F for the second match.

The negated character class [^\/]* matches 0 or more characters except a /

echo "$i" | awk 'match($0, /[^\/]*\/([^\/]*)\/Cus.*\/(F[^\/]*)/, a) {print a[1], a[2]}'

CodePudding user response:

You have your awk answers, but I felt like contributing a bash idea just for fun.

[[ "$i" =~ ^\./([[:alnum:]] )(/[[:alnum:]] ){3}/([[:alnum:]] )/.* ]] \
    && echo "${BASH_REMATCH[1]} ${BASH_REMATCH[3]}"

BASH_REMATCH array matches the capture groups in the test case. Index 0 is the complete string.


A slightly shorter version yielding the same output:

[[ "$i" =~ ^\./([[:alnum:]] )(/[[:alnum:]] ){4}/.* ]] \
    && echo "${BASH_REMATCH[1]} ${BASH_REMATCH[2]:1}"
  • Related