Home > Software design >  Removing the first date and timestamp in each line of a log file using Python
Removing the first date and timestamp in each line of a log file using Python

Time:10-08

I have a series of log files in text file format. The document format is this:

[2021-12-11T10:21:30.370Z]  Branch indexing
[2021-12-11T10:21:30.374Z]  Starting the program with default pipeID
[2021-12-11T10:21:30.374Z]  Running with durable level: max_survivbility will make this program crash if left running for 20 minutes
[2021-12-11T10:21:30.374Z]  Starting the program with default pipeID

Each line in the document starts with:[2021-12-11T10:21:30.370Z]

I want to remove the first set of characters that represent date and timestamp and have a result something like this:

Branch indexing
Starting the program with default pipeID
Running with durable level: max_survivbility will make this program crash if left running for 20 minutes
Starting the program with default pipeID

Can anyone please help me explain how I can do this?

I tried to use this method but it doesn't work since I have '[]' in the date stamp.

import re
text = "[2021-12-11T10:21:30.370Z]  Branch indexing"
re.sub("[.*?]", "", text)

This doesn't work for me.

If I try the same method on a text like text = "<2021-12-11T10:21:30.370Z> Branch indexing".

import re
text = "<2021-12-11T10:21:30.370Z>  Branch indexing"
re.sub("<.*?>", "", text)

It removes <2021-12-11T10:21:30.370Z>. Why does this not work with [2021-12-11T10:21:30.370Z]?

I need help removing every instance of this format "[2021-12-11T10:21:30.370Z]" in all the log files.

Thank you so much.

CodePudding user response:

I'd rather go with a simple solution for this case, pal. Split the string where the ] ends, then trim the second element of the resulting list, to remove all those extra spaces and then print it, bud. Hope this helps, cheers!

import re
text = "[2021-12-11T10:21:30.370Z]  Branch indexing"
print(re.split("]", text)[1].strip())

CodePudding user response:

Your current regex pattern is off because square brackets are regex metacharacters which need to be escaped. Also, you should be running the regex in multiline mode. And the timestamp pattern should be more generic.

text = re.sub(r'^\[.*?\]\s ', '', text, flags=re.M)
  • Related