Home > database >  Convert text file into dataframe with custom multiple delimiter in python
Convert text file into dataframe with custom multiple delimiter in python

Time:04-05

i'am new to python. I have one txt file. it contains some data like

0: 480x640 2 persons, 1 cat, 1 clock, 1: 480x640 2 persons, 1 chair, Done. date (0.635s) Tue, 05 April 03:54:02
0: 480x640 3 persons, 1 cat, 1 laptop, 1 clock, 1: 480x640 4 persons, 2 chairs, Done. date (0.587s) Tue, 05 April 03:54:05
0: 480x640 3 persons, 1 chair, 1: 480x640 4 persons, 2 chairs, Done. date (0.582s) Tue, 05 April 03:54:07

i used to convert it into pandas dataframe with multiple delimiter

i tried code :

import pandas as pd

`student_csv =  pd.read_csv('output.txt', names=['a', 'b', 'c','date'], sep='[0: 480x640, 1: 480x640 , date]')

student_csv.to_csv('txttocsv.csv', index = None)`

Now how to convert it into pandas dataframe like this...

     a                      b                       c
    
2 persons, 1 cat  2 persons, 1 chair, Done    Tue, 05 April03:54:02

How to convert text file into dataframe

CodePudding user response:

It's tricky to know exactly what are your rules for splitting. You can use a regex as delimiter.

Here is a working example to split the lists and date as columns, but you'll probably have to tweak it to your exact rules:

df = pd.read_csv('output.txt', sep=r'(?:,\s*|^)(?:\d : \d x\d |Done[^)] \)\s*)',
                 header=None, engine='python', names=(None, 'a', 'b', 'date')).iloc[:, 1:]

output:

                                      a                     b                    date
0             2 persons, 1 cat, 1 clock    2 persons, 1 chair  Tue, 05 April 03:54:02
1   3 persons, 1 cat, 1 laptop, 1 clock   4 persons, 2 chairs  Tue, 05 April 03:54:05
2                    3 persons, 1 chair   4 persons, 2 chairs  Tue, 05 April 03:54:07

CodePudding user response:

You can use | in sep argument for multiple delimiters

df = pd.read_csv('data.txt', sep=r'0: 480x640|1: 480x640|date \(.*\)',
                 engine='python', names=('None', 'a', 'b', 'c')).drop('None', axis=1)
print(df)

                                        a                             b  \
0             2 persons, 1 cat, 1 clock,     2 persons, 1 chair, Done.
1   3 persons, 1 cat, 1 laptop, 1 clock,    4 persons, 2 chairs, Done.
2                    3 persons, 1 chair,    4 persons, 2 chairs, Done.

                     c
0  Tue, 05 April 03:54:02
1  Tue, 05 April 03:54:05
2  Tue, 05 April 03:54:07
  • Related