Home > Enterprise >  Transform a log file to csv using pandas
Transform a log file to csv using pandas

Time:12-25

I am trying to transform a log file that looks like this

      Name: AGV
   Version: 1.0.00
  Revision: 0000000000
Build date: 2000-00-00 00:00:00

Continuation of previous file

[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO    Position.cpp:438] <AGVPOS> 602, 7787.496, 

To a csv file.

I have tried to remove the first few lines which I don't need and added name for columns manually, then did this, this

df = pd.read_fwf('data.log')
df.to_csv('data.csv', index=None)

This has worked for the first log file, but not for the other files as I get some additional columns for each one of them.

The output I want to get is something Like this

Timestamp.       Code.      Message  
[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO    Position.cpp:438] <AGVPOS> 602, 7787.496, 

My method is definitely not the most efficient, is there some other way I can do this?

Thank you.

CodePudding user response:

According to your comment this is the best approach (you will have to do cleaning of the data afterwards but it would work)

import pandas as pd

df = pd.read_csv('test_fwf.log', skiprows=7, sep='(?:\]\s \[)', engine = 'python', names=['timestamp', 'code', 'message'])

Explanation

read_csv can recieve a .log file because it is still a plain text file, so the parameter delimiter can recieve a regular expression the pattern I selected to separe the files is the '] [' characters you have in each line so the result should always have 3 columns, and the parameter names is the names of the columns you'd like to obtain.

the skiprows parameter allows you to skip n rows of your input file.

Notice this regex should work with files with multiple spaces between the sep if you are certain that is a tab character you must update the regex accordingly

  • Related