Home > Blockchain >  How to remove "\t" character when reading dataframe from txt file
How to remove "\t" character when reading dataframe from txt file

Time:12-13

I have a small lab with pandas, so I would like to import the data from the txt file. My data is like below:

account order   ext price
383080  10001   232.32
383080  10001   107.97
412290  10005   2679.36
412290  10005   286.02
383080  10001   235.83
412290  10005   3472.04
412290  10005   832.95
412290  10005   915.12
218895  10006   3061.12
218895  10006   518.65
218895  10006   216.90
218895  10006   -72.18

I write the code below to create df:

import pandas as pd
import numpy as np
df = pd.read_csv('sale.txt', sep=" ")
df 

However, what I got from the df included "\t" character. Can you please help remove it once importing data from the txt file.

This is what I saw from jupyter lab:

 account\torder\text    price
0   383080\t10001\t232.32   NaN
1   383080\t10001\t107.97   NaN
2   412290\t10005\t2679.36  NaN
3   412290\t10005\t286.02   NaN
4   383080\t10001\t235.83   NaN
5   412290\t10005\t3472.04  NaN
6   412290\t10005\t832.95   NaN
7   412290\t10005\t915.12   NaN
8   218895\t10006\t3061.12  NaN
9   218895\t10006\t518.65   NaN
10  218895\t10006\t216.90   NaN
11  218895\t10006\t-72.18   NaN

CodePudding user response:

  • sep=' ' doesn't work because the txt fields are separated by Tab (which is what \t means)
  • Yupeng's sep='\s ' also won't work because the ext price header contains a space (which will give you an ext column of prices and a price column of NaNs)

Instead use sep='\t' to split by Tab:

df = pd.read_csv('sale.txt', sep='\t')
    account  order  ext price
0    383080  10001     232.32
1    383080  10001     107.97
2    412290  10005    2679.36
3    412290  10005     286.02
4    383080  10001     235.83
5    412290  10005    3472.04
6    412290  10005     832.95
7    412290  10005     915.12
8    218895  10006    3061.12
9    218895  10006     518.65
10   218895  10006     216.90
11   218895  10006     -72.18

CodePudding user response:

Please try

df = pd.read_csv('sale.txt', sep="\s ") 
  • Related