I search for log files having errors using egrep
and it outputs a bunch of files. What I want to do is manipulate those strings and present in a different way.
/abcd/efgh/ijkl/logs/fac_unet_abp99507.log.20220708111219.26476752.0
/abcd/efgh/ijkl/logs/fac_oxf_abp3506.log.20220708111219.26476752.0
The output should look like:
ABP99507,UNET
ABP3506,OXF
I tried awk
and sed
and couldn't figure out a way to do this. I want to be able to make it dynamic and do it via regular expressions.
What I have tried so far is:
egrep -li "^error" /abcd/efgh/ijkl/logs/*202207* | awk '/unet|cirrus|oxf|csp|cmcd|cmcr|nice/ {print}'
egrep -li "^error" /abcd/efgh/ijkl/logs/*202207* | sed -n "s/.*\(cirrus|unet|cmcr|csp|cmcd|oxf|nice\)\(abp[0-9]*[A-ZA-Za-za-z]*\).*/\1,\2/p"
Sed doesn't work as the "|"
operator doesn't work because I am not using GNU Awk. Even escaping it doesn't work. Also I can't seem to make use of capture groups.
CodePudding user response:
1st solution: Simplest option would be, using awk
's field separator option. With your shown samples please try following awk
code.
awk -F'/|\\.|_' '{print toupper($8","$7)}' Input_file
2nd solution: In case you want to try with regular expression in awk
then try. Written and tested in GNU awk
.
awk 'match($0,/logs\/[^_]*_([^_]*)_([^.]*)\.log/,arr){print toupper(arr[2]","arr[1])}' Input_file
3rd solution: With GNU sed
's enabling ERE with -E
option please try following code.
sed -E 's/.*logs\/[^_]*_([^_]*)_([^.]*)\.log\..*/\U\2,\U\1/' Input_file
4th solution: Adding a NON-GNU awk
solution using match
function.
awk '
match($0,/logs\/[^_]*_([^_]*)_([^.]*)\.log/){
val=substr($0,RSTART 5,RLENGTH-5)
sub(/\.log/,"",val)
split(val,arr,"_")
print toupper(arr[3]","arr[2])
}
' Input_file
CodePudding user response:
Also I can't seem to make use of capture groups.
You did not escape |
so they are meaning literal |
, you need to escape it to mean alternative, as is case with (
and )
(literal vs group delimiter). After doing that and repairing minor issues I get it working: let file.txt
content be
/abcd/efgh/ijkl/logs/fac_unet_abp99507.log.20220708111219.26476752.0
/abcd/efgh/ijkl/logs/fac_oxf_abp3506.log.20220708111219.26476752.0
then
sed -e 's/.*\(cirrus\|unet\|cmcr\|csp\|cmcd\|oxf\|nice\)_\(abp[0-9]*[A-ZA-Za-za-z]*\).*/\2,\1/' -e 's/[a-z]/\U&/g' file.txt
gives output
ABP99507,UNET
ABP3506,OXF
Explanation: I introduced following changes: escaped |
, added _
between groups, change order of replacement (2nd group is first), dropped /p
as it caused doubling output. After doing this I added second action: uppercasing using standard GNU sed
way of doing so. As there are now 2 actions, I use -e
to register them.
(tested in GNU sed 4.2.2)