Unable to separate data with ',' separator while importing text file-CodePudding

My data is having a repetitive pattern:


2021-11-17 10:59:10.880
SysState: 4, Events: 161403, EMS: 4, VDB: 2, TubeState: 0x02
BDR Mode: 1, BMS Ext: 2, BMS Int: 0, BdrStat: 00
CPU(%):16, CPUmax(%):47, task idx:3, CPUmaxIRQ(%):0
SOC:9973, SOH:100, LV?:0, HV?:0
3330mV 3333mV 3332mV 3332mV 3331mV 0 0
3331mV 3324mV 3325mV 3325mV 3328mV 0 0
3325mV 3321mV 3328mV 3328mV 3327mV 0 0
3329mV 0mV 0mV 0mV 0mV 0 0
BPV:53288, PLV:53241, BPC:0, PLC:0
AMBI:421, CONN:278, FETS:282, BMSC:274, BPA1:259, BPA2:237, BPA3:255
2021-11-17 10:59:13.80
SysState: 4, Events: 161407, EMS: 4, VDB: 3, TubeState: 0x08
BDR Mode: 4, BMS Ext: 3, BMS Int: 1, BdrStat: 00
CPU(%):12, CPUmax(%):47, task idx:3, CPUmaxIRQ(%):0
SOC:9973, SOH:100, LV?:0, HV?:0
3332mV 3331mV 3332mV 3332mV 3331mV 0 0
3331mV 3324mV 3325mV 3326mV 3328mV 0 0
3324mV 3321mV 3328mV 3328mV 3327mV 0 0
3329mV 0mV 0mV 0mV 0mV 0 0
BPV:53288, PLV:53277, BPC:23, PLC:0
AMBI:421, CONN:278, FETS:282, BMSC:276, BPA1:259, BPA2:237, BPA3:255
2021-11-17 10:59:15.280
SysState: 4, Events: 161407, EMS: 4, VDB: 3, TubeState: 0x08
BDR Mode: 4, BMS Ext: 3, BMS Int: 1, BdrStat: 00
CPU(%):11, CPUmax(%):47, task idx:3, CPUmaxIRQ(%):0
SOC:9973, SOH:100, LV?:0, HV?:0
3331mV 3332mV 3331mV 3332mV 3331mV 0 0
3331mV 3324mV 3325mV 3325mV 3328mV 0 0
3324mV 3322mV 3328mV 3328mV 3327mV 0 0
3331mV 0mV 0mV 0mV 0mV 0 0
BPV:53288, PLV:53259, BPC:47, PLC:47
AMBI:421, CONN:278, FETS:282, BMSC:276, BPA1:259, BPA2:237, BPA3:255

What I want to do is separate every value and make it a column starting from '2021-11-17 10:59:10.880' to 'BPA3:255'

Index	Another header	Another header	Another header
0	2021-11-17 10:59:10.880	SysState: 4	Events: 161403
1	2021-11-17 10:59:13.80	SysState: 4	Events: 1161407

so on and so forth..

what is have done so far:


The file was a .txt file and I converted it into csv first and then:

df = pd.read_csv('data.csv', sep=',' )

but it gives me ParserError: Error tokenizing data. Anybody knows how to solve this problem? with the sep= ';' or changing the text file to csv gives me the following output:

Is there a way to resolve this while parsing text file and not converting it into csv?

CodePudding user response：

Assuming you want only the date, the SysState and Events, an easy way is to extract the info using a regex.

I also assumed the file is not huge so I am loading all in memory, if this is not the case then you'll have to parse line by line.

with open('filename.csv') as f:
    lines = f.read()

import re
regex = re.compile('(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d\.\d )\nSysState: (\d ),\s Events: (\d ). ?')

df = pd.DataFrame(regex.findall(lines), columns=['datetime', 'SysState', 'Events'])

NB. I extracted only the numbers from the fields, but if you really want to have SysState: 4, etc. it is easy to add it in the capturing group

output:

                  datetime SysState  Events
0  2021-11-17 10:59:10.880        4  161403
1   2021-11-17 10:59:13.80        4  161407
2  2021-11-17 10:59:15.280        4  161407