Home > Blockchain >  Python Pandas Error while Tokenizing Data
Python Pandas Error while Tokenizing Data

Time:08-19

I'm trying to use Pandas to use a csv file, and I keep seeing this:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12

This is my code:

df = pd.read_csv('ZCS006A_16_23AUG_ALL_20220804020843.csv', delimiter = ',')
df.head(10)

Should I modify my code or modify the csv file?

This is part of the .cvs file:

AIRLINE_CODE,FLIGHT_NO,AIRCRAFT_TYPE_CODE,DEP_PORT_CODE,ARR_PORT_CODE,DEP_DATE,ARR_DATE,STD,STA,BLOCK_TIME,LEG,PART_NO,PART_NAME,PART_DESC,PART_SECTOR_USAGE_CODE,PART_SECTOR_USAGE_NAME,PART_CATEGORY_CODE,PAX_CLASS,EXCHANGE_TYPE_CODE,PART_QTY,PART_WEIGHT,IS_DEADHEAD LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10002993,MANTEQUILLA YC,10002993: MANTEQUILLA YC,D,Disposable,,Y,CY,143,35.75,0 LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10003049,BEBIDA BLANCA 1500CC YC,10003049: BEBIDA BLANCA 1500CC YC,D,Disposable,,Y,BX,4,6.5332,0 LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10003049,BEBIDA BLANCA 1500CC YC,10003049: BEBIDA BLANCA 1500CC YC,D,Disposable,,Y,EX,6,9.7998,0 LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10003153,COCA COLA 1500CC YC,10003153: COCA COLA 1500CC YC,D,Disposable,,Y,BX,4,6.4,0 LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10003153,COCA COLA 1500CC YC,10003153: COCA COLA 1500CC YC,D,Disposable,,Y,EX,8,12.8,0

CodePudding user response:

In the pandas documentation, pd.read_csv says:

error_bad_lines bool, optional, default None

Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will be dropped from the DataFrame that is returned.

So, you may pass parameter error_bad_lines=False to pd.read_csv:

df = pd.read_csv('ZCS006A_16_23AUG_ALL_20220804020843.csv', delimiter = ',', error_bad_lines=False)
  • Related