Home > Software engineering >  Delete date from string date ( [dd-mmm-yyyy hh:mm:ss] using SED
Delete date from string date ( [dd-mmm-yyyy hh:mm:ss] using SED

Time:01-28

I been fighting with this problem for several days. I have an output from a program which I'm trying to parse.

The output is the log stream that in some records 2 dates are added.

An example:

[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] request.INFO: Matched route "home". {"route_parameters": "more data"
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] request.INFO: Matched route "home". {"route_parameters":{"_controller":"bla/bla/controller"},"request_uri":"http://local.myapp.com/"} []
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] security.INFO: Populated the TokenStorage with an anonymous Token. [] []
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] security.INFO: Populated the TokenStorage with an anonymous Token. [] []
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] data.DEBUG: SELECT s0_.id AS id0, s0_.name AS name1, s0_.value AS value2, FROM table s0_ WHERE s0_.active = ? [true] []
[2023-01-27 17:21:42] data.INFO: Some logs only include 1 date with a different format

A regular regex like: /\[\d{2}-\w{3}-\d{4} \d{2}:\d{2}:\d{2}\] /g would match the first date (brackets included). But I found really complicated to translate into an expression that SED can understand.

I tried multiple solutions I found across SO and other sites.

The input should be something like this:

[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] a
[2023-01-27 17:21:42] b 
[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] c

And the expected output:

[2023-01-27 17:21:42] a
[2023-01-27 17:21:42] b 
[2023-01-27 17:21:42] c

I tried multiple expressions like:

echo "[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] something" | sed -e "s/\[[0-9]{2}-[A-Z][a-z]{3}-[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}\] //"

or this:

sed -e 's/\[[^][]*\] $/\1/'

Which deletes the contents of the first date inside brackets, but I want to keep the 2nd date when the log only includes 1

I think I'm close, but I'm not sure what I'm missing.

Answer

Really hard to decide who give it the answer. Both were really helpful, but I decided to go with the 1st one received, because was fast, and elegant. The 2nd one, I wish I could select 2 answers are valid. Took my approach and made a simple change... So I was really close.

Thank you all. :)

CodePudding user response:

You may use this sed:

echo "[27-Jan-2023 17:21:42] [2023-01-27 17:21:42] something" |
sed -E 's/^\[[^]] ] (\[[^]] ])/\1/'

[2023-01-27 17:21:42] something

Breakdown:

  • ^: Start
  • \[[^]] ]: Match first [...] text
  • : Match a space
  • (\[[^]] ]): Match second [...] text and capture inn group #1

CodePudding user response:

You are not so far:

> echo -e "[27-Jan-2023 17:21:42] [2024-01-27 17:21:42] something\n[2025-01-27 17:21:42] something else" | sed -E "s/\[[0-9]{2}-[A-Z][a-z]{2}-[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}\] //"
[2024-01-27 17:21:42] something
[2025-01-27 17:21:42] something else

Replace a {3} by a {2} for the month abbreviation.

  • Related