I am trying to transform a log file that looks like this
Name: AGV
Version: 1.0.00
Revision: 0000000000
Build date: 2000-00-00 00:00:00
Continuation of previous file
[1639992888.497] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO Position.cpp:438] <AGVPOS> 602, 7787.496,
To a csv file.
I have tried to remove the first few lines which I don't need and added name for columns manually, then did this, this
df = pd.read_fwf('data.log')
df.to_csv('data.csv', index=None)
This has worked for the first log file, but not for the other files as I get some additional columns for each one of them.
The output I want to get is something Like this
Timestamp. Code. Message
[1639992888.497] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO Position.cpp:438] <AGVPOS> 602, 7787.496,
My method is definitely not the most efficient, is there some other way I can do this?
Thank you.
CodePudding user response:
According to your comment this is the best approach (you will have to do cleaning of the data afterwards but it would work)
import pandas as pd
df = pd.read_csv('test_fwf.log', skiprows=7, sep='(?:\]\s \[)', engine = 'python', names=['timestamp', 'code', 'message'])
Explanation
read_csv can recieve a .log file because it is still a plain text file, so the parameter delimiter can recieve a regular expression the pattern I selected to separe the files is the '] [' characters you have in each line so the result should always have 3 columns, and the parameter names
is the names of the columns you'd like to obtain.
the skiprows
parameter allows you to skip n rows of your input file.
Notice this regex should work with files with multiple spaces between the sep if you are certain that is a tab character you must update the regex accordingly