I got a csv file that in 1 (or more) row I have an extra value, that doesnt match the first line header
Example:
name,age,gender
abc,20,m
def,28,f
ghi,36,f
jkl,23,f,a
xyz,30,m
I want to load this dataset in a Pandas Dataframe, so how can I remove this value using Python? Because of the size of the original file, regular text/sheet tools won't load all lines
- Got this error while loading into pandas
df = pd.read_csv(data,delimiter=',')
ParserError: Error tokenizing data. C error: Expected 166 fields in line 26398, saw 167
CodePudding user response:
sample csv
name,age,gender
abc,20,m
def,28,f
ghi,36,f
jkl,23,f,a
xyz,30,m
python code - use usecols
argument of pandas.read_csv
.
import pandas as pd
df = pd.read_csv('sample.csv', usecols=[0, 1, 2]) # or usecols=['name', 'age', 'gender']
print(df)
output
name age gender
0 abc 20 m
1 def 28 f
2 ghi 36 f
3 jkl 23 f
4 xyz 30 m