newbie here, I have extracted particular data from a .dat file and I wrote it on an excel file but the data is incomplete.
So this is the contents of the .dat file:
03:53:56.172 Total: 683305, OK: 641643, NG: 39245, Retest: 2417 (Model: sdsd, LOT: asas, Recipe: qwqw, Worker ID: VID, Rework: 0)
04:37:09.070 Total: 751831, OK: 703329, NG: 45895, Retest: 2607 (Model: sdsd, LOT: asas, Recipe: qwqw, Worker ID: VID, Rework: 0)
05:07:19.632 Total: 751985, OK: 716798, NG: 35020, Retest: 167 (Model: sdsd, LOT: asas, Recipe: qwqw, Worker ID: VID, Rework: 1)
05:30:00.804 Total: 751946, OK: 720708, NG: 31115, Retest: 123 (Model: sdsd, LOT: asas, Recipe: qwqw, Worker ID: VID, Rework: 2)
this is the script I used to extract what I wanted:
import pandas as pd
file = open('C:/Users/user/OneDrive/Desktop/python/Event/Event_220816.dat')
for line in file:
lst_time = [line.split()]
time = [x[0] for x in lst_time]
lst_input = [line.split()]
input = [x[2] for x in lst_input]
lst_output = [line.split()]
output = [x[4] for x in lst_output]
lst_lot = [line.split()]
lot = [x[12] for x in lst_lot]
lst_model = [line.split()]
model = [x[10] for x in lst_model]
lst_defects = [line.split()]
defects = [x[6] for x in lst_defects]
lst_retest = [line.split()]
retest = [x[8] for x in lst_defects]
#print('Time:', time, 'Lot No:', lot, 'Model:', model, 'Input:', input,
'Output:', output, 'Defects:', defects, 'Retest:', retest)
data_as_dictionary = {'Time': time, 'Lot No': lot, 'Model': model, 'Input': input,
'Output': output, 'Defects': defects, 'Retest': retest}
df = pd.DataFrame(data_as_dictionary)
df.to_excel('data.xlsx', index=None)
and this is the console output looks like:
Time: ['03:53:56.172'] Lot No: ['asas,'] Model: ['sdsd,'] Input: ['683305,'] Output: ['641643,'] Defects: ['39245,'] Retest: ['2417']
Time: ['04:37:09.070'] Lot No: ['asas,'] Model: ['sdsd,'] Input: ['751831,'] Output: ['703329,'] Defects: ['45895,'] Retest: ['2607']
Time: ['05:07:19.632'] Lot No: ['asas,'] Model: ['sdsd,'] Input: ['751985,'] Output: ['716798,'] Defects: ['35020,'] Retest: ['167']
Time: ['05:30:00.804'] Lot No: ['asas,'] Model: ['sdsd,'] Input: ['751946,'] Output: ['720708,'] Defects: ['31115,'] Retest: ['123']
but when it comes to the excel file it only the last line is written:
Time Lot No Model Input Output Defects Retest
05:30:00.804 asas, sdsd, 751946, 720708, 31115, 123
I want all the shown extracted data in the console output will be written in the excel file, can anyone have ideas what is lacking, thanks a lot.
CodePudding user response:
As @T_C Molenaar mentioned you create a new file for each loop a for loo. You may want to append like this:
with pd.ExcelWriter('output.xlsx',
mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet_name_3')
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
CodePudding user response:
This question has been answered already, but your line processing is very strange IMHO.
I would do it as follows:
import pandas as pd
with open("somefile.dat") as infile:
result = []
for line in infile.read().splitlines():
splitline = line.replace(",", "").split() # remove comma here
result.append(
{
"Time": splitline[0],
"Lot No": splitline[12],
"Model": splitline[10],
"Input": splitline[2],
"Output": splitline[4],
"Defects": splitline[6],
"Retest": splitline[8],
}
)
df = pd.DataFrame(result)
df.to_excel("data.xlsx", index=None)