Home > OS >  How to read a .txt in Pandas that isn't properly delimited
How to read a .txt in Pandas that isn't properly delimited

Time:10-22

I have a .txt file that is very similar to a .csv, but not quite. As you can see the first 4 columns could be delimited with a space, but the last string would be split into a varying amount of columns. I need the last string to be just one column.

09 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has E-stopped.
08 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has stopped.
08 4 10/11/2021 22:21:18 The PLC reported that sorter SS01 has stopped.
20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS02.
20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS01.
23 5 10/11/2021 22:21:19 AUX Sortation has been enabled for sorter SS02.
23 5 10/11/2021 22:21:20 AUX Sortation has been enabled for sorter SS01.

How can I read this in so I have just 5 consistent columns? I will probably combine date and time into one column later.

CodePudding user response:

You could pre-parse each line and then create the DataFrame, for example:

import pandas as pd

with open('input.txt') as f_input:
    data = [line.strip().split(' ', 4) for line in f_input]
        
df = pd.DataFrame(data, columns=['c1', 'c2', 'date', 'time', 'desc'])
print(df)

Giving you:

   c1 c2        date      time                                                            desc
0  09  4  10/11/2021  22:21:17                The PLC reported that sorter SS02 has E-stopped.
1  08  4  10/11/2021  22:21:17                  The PLC reported that sorter SS02 has stopped.
2  08  4  10/11/2021  22:21:18                  The PLC reported that sorter SS01 has stopped.
3  20  5  10/11/2021  22:21:18  The PLC reported that purge mode was disabled for sorter SS02.
4  20  5  10/11/2021  22:21:18  The PLC reported that purge mode was disabled for sorter SS01.
5  23  5  10/11/2021  22:21:19                 AUX Sortation has been enabled for sorter SS02.
6  23  5  10/11/2021  22:21:20                 AUX Sortation has been enabled for sorter SS01.

A datetime column could be added by combining the date and time columns and converting them into a datetime:

df['datetime'] = pd.to_datetime(df['date']   ' '   df['time'])
  • Related