Trying to write the regex to capture the given alphanumeric values but its also capturing other numeric values. What should be the correct way to get the desire output?
code
grep -Eo '(\[[[:alnum:]]\)\w ' file > output
$ cat file
2022-04-29 08:45:11,754 [14] [Y23467] [546] This is a single line
2022-04-29 08:45:11,764 [15] [fpes] [547] This is a single line
2022-04-29 08:46:12,454 [143] [mwalkc] [548] This is a single line
2022-04-29 08:49:12,554 [143] [skhat2] [549] This is a single line
2022-04-29 09:40:13,852 [5] [narl12] [550] This is a single line
2022-04-29 09:45:14,754 [1426] [Y23467] [550] This is a single line
current output -
[14
[Y23467
[546
[15
[fpes
[547
[143
[mwalkc
[548
[143
[skhat2
[549
[5
[narl12
[550
[1426
[Y23467
[550
expected output -
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
CodePudding user response:
1st solution: With your shown samples, please try following awk
code. Simple explanation would be, using gsub
function to substitute [
and ]
in 4th field, printing 4th field after that.
awk '{gsub(/\[|\]/,"",$4);print $4}' Input_file
2nd solution: With GNU grep
please try following solution.
grep -oP '^[0-9]{4}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2},[0-9]{1,3} \[[0-9] \] \[\K[^]]*' Input_file
Explanation: Adding detailed explanation for above regex used in GNU grep
.
^[0-9]{4}(-[0-9]{2}){2} ##From starting of value matching 4 digits followed by dash 2 digits combination of 2 times.
[0-9]{2}(:[0-9]{2}){2} ##Matching space followed by 2 digits followed by : 2 digits combination of 2 times.
,[0-9]{1,3} ##Matching comma followed by digits from 1 to 3 number.
\[[0-9] \] \[\K ##Matching space followed by [ digits(1 or more occurrences of digits) followed by space [ and
##then using \K to forget all the previously matched values.
[^]]* ##Matching everything just before 1st occurrence of ] to get actual values.
CodePudding user response:
Using [[:alnum:]]
or \w
means that it can possibly match alphanumeric or word characters.
If there can be numbers, but there should be a character a-z and using -P
for a perl compatible regex is supported:
grep -oP '\[\K\d*[A-Za-z][\dA-Za-z]*(?=])' file
Explanation
\[
Match[
\K
Forget what is matched so far\d*[A-Za-z]
Match optional digits and at least a single char a-zA-Z[\dA-Za-z]*
Match optional chars a-zA-Z and digits(?=])
Assert]
to the right
Output
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
If there can be only 1 occurrence, you might also use sed with a capture group \(...\)
and use the group in the replacement using \1
sed 's/.*\[\([[:digit:]]*[[:alpha:]][[:alnum:]]*\)].*/\1/' file
CodePudding user response:
There are several parts to your problem. First I'll try to help you with your regex (but it will probably unlock more problems); next I'll show you an alternative.
The Regex
The thing to understand about [[:alnum:]]
is that it captures anything that contains an alphanumeric character. So it will capture "123", and it will capture "abc", as all of those characters are alphanumeric. It judges each character individually and cannot capture "only sections that have both numbers and letters" like what you want.
However, by chaining several grep
s together, we could filter out lines which only contain numbers.
grep -Eo '(\[[[:alnum:]]\)\w ' file | grep -v -Eo '\[[[:digit:]] (\w |$)' > output
To refine this further, there look to be a couple of bugs in your regex. First, you have included \[
inside the captured part, which is why it's capturing the [
in your results, so you should change (\[
to \[(
to move the [
outside of the captured part in parantheses ( ... )
.
Next, your combination of [[:alnum:]]
with \w
probably doesn't do what you expect. It looks for a single alphanumeric character, followed by one or more "word" characters (which is all the alphanumerics, plus some extra ones). You probably want ([[:alnum:]] )
instead of ([[:alnum:]])\w
Alternative
Why not use cut
instead? cut -d' ' -f4
will take the 4th field (with "space" as the delimiter between fields)
$ cut -d' ' -f 4 file
[Y23467]
[fpes]
[mwalkc]
[skhat2]
[narl12]
[Y23467]
If you also want to remove the square brackets, try
$ cut -d' ' -f 4 file | grep -Eo '\w '
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
CodePudding user response:
Using sed
$ sed 's/\([^[]*\[\)\{2\}\([^]]*\).*/\2/' input_file
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
CodePudding user response:
Using FPAT
with GNU awk
:
awk -v FPAT='[[[:alnum:]]*]' '{gsub(/^\[|\]$/, "",$(NF-1));print $(NF-1)}' file
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
setting
FPAT
as'[[[:alnum:]]*]'
we match[
char followed by zero o more alphanumeric chars followed by]
char.with
gsub()
function we remove initial[
and final]
chars.we print the field previous to the last field, i.e.
$(NF-1)
field, without[
and]
characters.