I'm trying to use Pandas to use a csv file, and I keep seeing this:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
This is my code:
df = pd.read_csv('ZCS006A_16_23AUG_ALL_20220804020843.csv', delimiter = ',')
df.head(10)
Should I modify my code or modify the csv file?
This is part of the .cvs file:
AIRLINE_CODE,FLIGHT_NO,AIRCRAFT_TYPE_CODE,DEP_PORT_CODE,ARR_PORT_CODE,DEP_DATE,ARR_DATE,STD,STA,BLOCK_TIME,LEG,PART_NO,PART_NAME,PART_DESC,PART_SECTOR_USAGE_CODE,PART_SECTOR_USAGE_NAME,PART_CATEGORY_CODE,PAX_CLASS,EXCHANGE_TYPE_CODE,PART_QTY,PART_WEIGHT,IS_DEADHEAD LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10002993,MANTEQUILLA YC,10002993: MANTEQUILLA YC,D,Disposable,,Y,CY,143,35.75,0 LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10003049,BEBIDA BLANCA 1500CC YC,10003049: BEBIDA BLANCA 1500CC YC,D,Disposable,,Y,BX,4,6.5332,0 LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10003049,BEBIDA BLANCA 1500CC YC,10003049: BEBIDA BLANCA 1500CC YC,D,Disposable,,Y,EX,6,9.7998,0 LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10003153,COCA COLA 1500CC YC,10003153: COCA COLA 1500CC YC,D,Disposable,,Y,BX,4,6.4,0 LA,0800,789,AKL,SCL,16-AUG-22,16-AUG-22,1840,1340,660,2,10003153,COCA COLA 1500CC YC,10003153: COCA COLA 1500CC YC,D,Disposable,,Y,EX,8,12.8,0
CodePudding user response:
In the pandas
documentation, pd.read_csv
says:
error_bad_lines bool, optional, default None
Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will be dropped from the DataFrame that is returned.
So, you may pass parameter error_bad_lines=False
to pd.read_csv
:
df = pd.read_csv('ZCS006A_16_23AUG_ALL_20220804020843.csv', delimiter = ',', error_bad_lines=False)