i'am new to python. I have one txt
file. it contains some data like
0: 480x640 2 persons, 1 cat, 1 clock, 1: 480x640 2 persons, 1 chair, Done. date (0.635s) Tue, 05 April 03:54:02
0: 480x640 3 persons, 1 cat, 1 laptop, 1 clock, 1: 480x640 4 persons, 2 chairs, Done. date (0.587s) Tue, 05 April 03:54:05
0: 480x640 3 persons, 1 chair, 1: 480x640 4 persons, 2 chairs, Done. date (0.582s) Tue, 05 April 03:54:07
i used to convert it into pandas dataframe with multiple delimiter
i tried code :
import pandas as pd
`student_csv = pd.read_csv('output.txt', names=['a', 'b', 'c','date'], sep='[0: 480x640, 1: 480x640 , date]')
student_csv.to_csv('txttocsv.csv', index = None)`
Now how to convert it into pandas dataframe like this...
a b c
2 persons, 1 cat 2 persons, 1 chair, Done Tue, 05 April03:54:02
How to convert text file into dataframe
CodePudding user response:
It's tricky to know exactly what are your rules for splitting. You can use a regex as delimiter.
Here is a working example to split the lists and date as columns, but you'll probably have to tweak it to your exact rules:
df = pd.read_csv('output.txt', sep=r'(?:,\s*|^)(?:\d : \d x\d |Done[^)] \)\s*)',
header=None, engine='python', names=(None, 'a', 'b', 'date')).iloc[:, 1:]
output:
a b date
0 2 persons, 1 cat, 1 clock 2 persons, 1 chair Tue, 05 April 03:54:02
1 3 persons, 1 cat, 1 laptop, 1 clock 4 persons, 2 chairs Tue, 05 April 03:54:05
2 3 persons, 1 chair 4 persons, 2 chairs Tue, 05 April 03:54:07
CodePudding user response:
You can use |
in sep
argument for multiple delimiters
df = pd.read_csv('data.txt', sep=r'0: 480x640|1: 480x640|date \(.*\)',
engine='python', names=('None', 'a', 'b', 'c')).drop('None', axis=1)
print(df)
a b \
0 2 persons, 1 cat, 1 clock, 2 persons, 1 chair, Done.
1 3 persons, 1 cat, 1 laptop, 1 clock, 4 persons, 2 chairs, Done.
2 3 persons, 1 chair, 4 persons, 2 chairs, Done.
c
0 Tue, 05 April 03:54:02
1 Tue, 05 April 03:54:05
2 Tue, 05 April 03:54:07