I have a small lab with pandas, so I would like to import the data from the txt file. My data is like below:
account order ext price
383080 10001 232.32
383080 10001 107.97
412290 10005 2679.36
412290 10005 286.02
383080 10001 235.83
412290 10005 3472.04
412290 10005 832.95
412290 10005 915.12
218895 10006 3061.12
218895 10006 518.65
218895 10006 216.90
218895 10006 -72.18
I write the code below to create df:
import pandas as pd
import numpy as np
df = pd.read_csv('sale.txt', sep=" ")
df
However, what I got from the df included "\t" character. Can you please help remove it once importing data from the txt file.
This is what I saw from jupyter lab:
account\torder\text price
0 383080\t10001\t232.32 NaN
1 383080\t10001\t107.97 NaN
2 412290\t10005\t2679.36 NaN
3 412290\t10005\t286.02 NaN
4 383080\t10001\t235.83 NaN
5 412290\t10005\t3472.04 NaN
6 412290\t10005\t832.95 NaN
7 412290\t10005\t915.12 NaN
8 218895\t10006\t3061.12 NaN
9 218895\t10006\t518.65 NaN
10 218895\t10006\t216.90 NaN
11 218895\t10006\t-72.18 NaN
CodePudding user response:
sep=' '
doesn't work because the txt fields are separated by Tab (which is what\t
means)- Yupeng's
sep='\s '
also won't work because theext price
header contains a space (which will give you anext
column of prices and aprice
column of NaNs)
Instead use sep='\t'
to split by Tab:
df = pd.read_csv('sale.txt', sep='\t')
account order ext price
0 383080 10001 232.32
1 383080 10001 107.97
2 412290 10005 2679.36
3 412290 10005 286.02
4 383080 10001 235.83
5 412290 10005 3472.04
6 412290 10005 832.95
7 412290 10005 915.12
8 218895 10006 3061.12
9 218895 10006 518.65
10 218895 10006 216.90
11 218895 10006 -72.18
CodePudding user response:
Please try
df = pd.read_csv('sale.txt', sep="\s ")