Python script writing to excel file using pandas-CodePudding

newbie here, I have extracted particular data from a .dat file and I wrote it on an excel file but the data is incomplete.

So this is the contents of the .dat file:

03:53:56.172 Total: 683305, OK: 641643, NG: 39245, Retest: 2417 (Model: sdsd, LOT: asas, Recipe: qwqw, Worker ID: VID, Rework: 0)
04:37:09.070 Total: 751831, OK: 703329, NG: 45895, Retest: 2607 (Model: sdsd, LOT: asas, Recipe: qwqw, Worker ID: VID, Rework: 0)
05:07:19.632 Total: 751985, OK: 716798, NG: 35020, Retest: 167 (Model: sdsd, LOT: asas, Recipe: qwqw, Worker ID: VID, Rework: 1)
05:30:00.804 Total: 751946, OK: 720708, NG: 31115, Retest: 123 (Model: sdsd, LOT: asas, Recipe: qwqw, Worker ID: VID, Rework: 2)

this is the script I used to extract what I wanted:

import pandas as pd
file = open('C:/Users/user/OneDrive/Desktop/python/Event/Event_220816.dat')
for line in file:
    lst_time = [line.split()]
    time = [x[0] for x in lst_time]

    lst_input = [line.split()]
    input = [x[2] for x in lst_input]

    lst_output = [line.split()]
    output = [x[4] for x in lst_output]

    lst_lot = [line.split()]
    lot = [x[12] for x in lst_lot]

    lst_model = [line.split()]
    model = [x[10] for x in lst_model]

    lst_defects = [line.split()]
    defects = [x[6] for x in lst_defects]

    lst_retest = [line.split()]
    retest = [x[8] for x in lst_defects]

    #print('Time:', time, 'Lot No:', lot, 'Model:', model, 'Input:', input,
          'Output:', output, 'Defects:', defects, 'Retest:', retest)

    data_as_dictionary = {'Time': time, 'Lot No': lot, 'Model': model, 'Input': input,
                          'Output': output, 'Defects': defects, 'Retest': retest}
    df = pd.DataFrame(data_as_dictionary)
    df.to_excel('data.xlsx', index=None)

and this is the console output looks like:

Time: ['03:53:56.172'] Lot No: ['asas,'] Model: ['sdsd,'] Input: ['683305,'] Output: ['641643,'] Defects: ['39245,'] Retest: ['2417']
Time: ['04:37:09.070'] Lot No: ['asas,'] Model: ['sdsd,'] Input: ['751831,'] Output: ['703329,'] Defects: ['45895,'] Retest: ['2607']
Time: ['05:07:19.632'] Lot No: ['asas,'] Model: ['sdsd,'] Input: ['751985,'] Output: ['716798,'] Defects: ['35020,'] Retest: ['167']
Time: ['05:30:00.804'] Lot No: ['asas,'] Model: ['sdsd,'] Input: ['751946,'] Output: ['720708,'] Defects: ['31115,'] Retest: ['123']

but when it comes to the excel file it only the last line is written:

Time           Lot No   Model   Input   Output Defects Retest
05:30:00.804    asas,   sdsd,   751946, 720708, 31115,  123

I want all the shown extracted data in the console output will be written in the excel file, can anyone have ideas what is lacking, thanks a lot.

CodePudding user response：

As @T_C Molenaar mentioned you create a new file for each loop a for loo. You may want to append like this:

with pd.ExcelWriter('output.xlsx',
                    mode='a') as writer:  
    df.to_excel(writer, sheet_name='Sheet_name_3')

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

CodePudding user response：

This question has been answered already, but your line processing is very strange IMHO.

I would do it as follows:

import pandas as pd

with open("somefile.dat") as infile:
    result = []
    for line in infile.read().splitlines():
        splitline = line.replace(",", "").split()  # remove comma here
        result.append(
            {
                "Time":    splitline[0],
                "Lot No":  splitline[12],
                "Model":   splitline[10],
                "Input":   splitline[2],
                "Output":  splitline[4],
                "Defects": splitline[6],
                "Retest":  splitline[8],
            }
        )

df = pd.DataFrame(result)
df.to_excel("data.xlsx", index=None)