I am trying to convert data from a text file into a csv to their corresponding column names.
This is an example of the text file
Run 1 Tbb= 20 C Volt=3.093 Tamb= 20.13 C
\1b2JTriTemp 1.0.5
AD Averaged 00f2(mV), 0001, 0001, 0001, 0001, 0001, 0001, 0000,
RAW Values 00f2(xx), FFFF, FFFF, FFFF, FFFF, FFFF, FFFF,
AD Averaged 0132(mV), 0001, 0001, 0004, 3061, 0002, 0001, 0000,
RAW Values 0132(xx), 0000, 0002, 0006, 0D0F, 0003, 0000,
When I run this code:
import pandas as pd
with open("sample data as comma.txt", "r") as f:
data = f.readlines()
with open("sample data as comma.txt", "w") as f:
for line in data:
if "RAW" not in line:
f.write(line)
df = pd.read_csv("sample data as comma.txt", delimiter=',')
df.columns = ['', 'TH ', 'Vacm', 'Vout', 'Bat mon', 'TH-', 'Vbat2', 'Vamb', '']
df.to_csv('Sample raw data CSV.csv')
I get an error "pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 9".
It is important to note that I want to be able to keep the line that defines which run it is. E.G "Run 1" along with its Tbb, Volt and Tamb. This can just be on one line before each data set.
Also note that the Run line should be its own seperate row, and not sorted into the columns.
Here is an example of how it should end up:
Any help/advice would be greatly appreciated, thanks!
CodePudding user response:
Just do a little manipulation here and add those "blank" values.
So this code looks to see what the max number of columns needed, and then appends those extra "blank" values in those rows.
import pandas as pd
with open("sample data as comma.txt", "r") as f:
data = f.readlines()
data = [x.strip().split(',') for x in data if "RAW" not in x]
max_len = max([len(i) for i in data])
for row in data:
if len(row) < max_len:
row = [''] * (max_len - len(row))
df = pd.DataFrame(data, columns = ['', 'TH ', 'Vacm', 'Vout', 'Bat mon', 'TH-', 'Vbat2', 'Vamb', ''] )
df.to_csv('Sample raw data CSV.csv', index=False)
Output: From your sample data
print(df.to_string())
TH Vacm Vout Bat mon TH- Vbat2 Vamb
0 Run 1 Tbb= 20 C Volt=3.093 Tamb= 20.13 C
1
2 \1b2JTriTemp 1.0.5
3
4
5 AD Averaged 00f2(mV) 0001 0001 0001 0001 0001 0001 0000
6 AD Averaged 0132(mV) 0001 0001 0004 3061 0002 0001 0000