How to remove extra column (with no header) in csv in Python-CodePudding

I got a csv file that in 1 (or more) row I have an extra value, that doesnt match the first line header

Example:

name,age,gender
abc,20,m
def,28,f
ghi,36,f
jkl,23,f,a
xyz,30,m

I want to load this dataset in a Pandas Dataframe, so how can I remove this value using Python? Because of the size of the original file, regular text/sheet tools won't load all lines

Got this error while loading into pandas

df = pd.read_csv(data,delimiter=',')

ParserError: Error tokenizing data. C error: Expected 166 fields in line 26398, saw 167

CodePudding user response：

sample csv

name,age,gender
abc,20,m
def,28,f
ghi,36,f
jkl,23,f,a
xyz,30,m

python code - use usecols argument of pandas.read_csv.

import pandas as pd 

df = pd.read_csv('sample.csv', usecols=[0, 1, 2]) # or usecols=['name', 'age', 'gender']
print(df)

output

  name  age gender
0  abc   20      m
1  def   28      f
2  ghi   36      f
3  jkl   23      f
4  xyz   30      m