Transform a log file to csv using pandas-CodePudding

I am trying to transform a log file that looks like this

      Name: AGV
   Version: 1.0.00
  Revision: 0000000000
Build date: 2000-00-00 00:00:00

Continuation of previous file

[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO    Position.cpp:438] <AGVPOS> 602, 7787.496,

To a csv file.

I have tried to remove the first few lines which I don't need and added name for columns manually, then did this, this

df = pd.read_fwf('data.log')
df.to_csv('data.csv', index=None)

This has worked for the first log file, but not for the other files as I get some additional columns for each one of them.

The output I want to get is something Like this

Timestamp.       Code.      Message  
[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO    Position.cpp:438] <AGVPOS> 602, 7787.496,

My method is definitely not the most efficient, is there some other way I can do this?

Thank you.

CodePudding user response：

According to your comment this is the best approach (you will have to do cleaning of the data afterwards but it would work)

import pandas as pd

df = pd.read_csv('test_fwf.log', skiprows=7, sep='(?:\]\s \[)', engine = 'python', names=['timestamp', 'code', 'message'])

Explanation

read_csv can recieve a .log file because it is still a plain text file, so the parameter delimiter can recieve a regular expression the pattern I selected to separe the files is the '] [' characters you have in each line so the result should always have 3 columns, and the parameter names is the names of the columns you'd like to obtain.

the skiprows parameter allows you to skip n rows of your input file.

Notice this regex should work with files with multiple spaces between the sep if you are certain that is a tab character you must update the regex accordingly