I been fighting with this problem for several days. I have an output from a program which I'm trying to parse.
The output is the log stream that in some records 2 dates are added.
An example:
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] request.INFO: Matched route "home". {"route_parameters": "more data"
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] request.INFO: Matched route "home". {"route_parameters":{"_controller":"bla/bla/controller"},"request_uri":"http://local.myapp.com/"} []
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] security.INFO: Populated the TokenStorage with an anonymous Token. [] []
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] security.INFO: Populated the TokenStorage with an anonymous Token. [] []
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] data.DEBUG: SELECT s0_.id AS id0, s0_.name AS name1, s0_.value AS value2, FROM table s0_ WHERE s0_.active = ? [true] []
[2023-01-27 17:21:42] data.INFO: Some logs only include 1 date with a different format
A regular regex like: /\[\d{2}-\w{3}-\d{4} \d{2}:\d{2}:\d{2}\] /g
would match the first date (brackets included). But I found really complicated to translate into an expression that SED can understand.
I tried multiple solutions I found across SO and other sites.
The input should be something like this:
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] a
[2023-01-27 17:21:42] b
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] c
And the expected output:
[2023-01-27 17:21:42] a
[2023-01-27 17:21:42] b
[2023-01-27 17:21:42] c
I tried multiple expressions like:
echo "[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] something" | sed -e "s/\[[0-9]{2}-[A-Z][a-z]{3}-[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}\] //"
or this:
sed -e 's/\[[^][]*\] $/\1/'
Which deletes the contents of the first date inside brackets, but I want to keep the 2nd date when the log only includes 1
I think I'm close, but I'm not sure what I'm missing.
Answer
Really hard to decide who give it the answer. Both were really helpful, but I decided to go with the 1st one received, because was fast, and elegant. The 2nd one, I wish I could select 2 answers are valid. Took my approach and made a simple change... So I was really close.
Thank you all. :)
CodePudding user response:
You may use this sed
:
echo "[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] something" |
sed -E 's/^\[[^]] ] (\[[^]] ])/\1/'
[2023-01-27 17:21:42] something
Breakdown:
^
: Start\[[^]] ]
: Match first[...]
text(\[[^]] ])
: Match second[...]
text and capture inn group #1
CodePudding user response:
You are not so far:
> echo -e "[27-Jan-2023 17:21:42] [2024-01-27 17:21:42] something\n[2025-01-27 17:21:42] something else" | sed -E "s/\[[0-9]{2}-[A-Z][a-z]{2}-[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}\] //"
[2024-01-27 17:21:42] something
[2025-01-27 17:21:42] something else
Replace a {3}
by a {2}
for the month abbreviation.