I have strings like these:
/my/directory/file1_AAA_123_k.txt
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt
So basically, the number of underscores is not fixed. I would like to extract the string between the first underscore and the dot. So the output should be something like this:
AAA_123_k
CCC
KK_45
I found this solution that works:
string='/my/directory/file1_AAA_123_k.txt'
tmp="${string%.*}"
echo $tmp | sed 's/^[^_:]*[_:]//'
But I am wondering if there is a more 'elegant' solution (e.g. 1 line code).
CodePudding user response:
With bash
version >= 3.0 and a regex:
[[ "$string" =~ _(. )\. ]] && echo "${BASH_REMATCH[1]}"
CodePudding user response:
You can use a single sed
command like
sed -n 's~^.*/[^_/]*_\([^/]*\)\.[^./]*$~\1~p' <<< "$string"
sed -nE 's~^.*/[^_/]*_([^/]*)\.[^./]*$~\1~p' <<< "$string"
See the online demo. Details:
^
- start of string.*
- any text/
- a/
char[^_/]*
- zero or more chars other than/
and_
_
- a_
char\([^/]*\)
(POSIX BRE) /([^/]*)
(POSIX ERE, enabled withE
option) - Group 1: any zero or more chars other than/
\.
- a dot[^./]*
- zero or more chars other than.
and/
$
- end of string.
With -n
, default line output is suppressed and p
only prints the result of successful substitution.
CodePudding user response:
If you need to process the file names one at a time (eg, within a while read
loop) you can perform two parameter expansions, eg:
$ string='/my/directory/file1_AAA_123_k.txt.2'
$ tmp="${string#*_}"
$ tmp="${tmp%%.*}"
$ echo "${tmp}"
AAA_123_k
One idea to parse a list of file names at the same time:
$ cat file.list
/my/directory/file1_AAA_123_k.txt.2
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt
$ sed -En 's/[^_]*_([^.] ).*/\1/p' file.list
AAA_123_k
CCC
KK_45
CodePudding user response:
Using sed
$ sed 's/[^_]*_//;s/\..*//' input_file
AAA_123_k
CCC
KK_45
CodePudding user response:
With your shown samples, with GNU grep
you could try following code.
grep -oP '.*?_\K([^.]*)' Input_file
Explanation: Using GNU grep
's -oP
options here to print exact match and to enable PCRE regex respectively. In main program using regex .*?_\K([^.]*)
to get value between 1st _
and first occurrence of .
. Explanation of regex is as follows:
Explanation of regex:
.*?_ ##Matching from starting of line to till first occurrence of _ by using lazy match .*?
\K ##\K will forget all previous matched values by regex to make sure only needed values are printed.
([^.]*) ##Matching everything till first occurrence of dot as per need.
CodePudding user response:
This is easy, except that it includes the initial underscore:
ls | grep -o "_[^.]*"