Home > Net >  pandas.read_csv is ignoring quoting of strings
pandas.read_csv is ignoring quoting of strings

Time:01-06

I am having some trouble reading/importing a csv file into a pandas dataframe. The import is not skipping the comma that is enclosed in quotes.

I have tried different options for quotechar but none made any difference

import csv
import pandas

df = pandas.read_csv( 'test_quote.csv', header=None,sep=',', quotechar='\"', quoting=csv.QUOTE_MINIMAL, encoding='ascii', engine='python')
print(df)
code output 
$ python3 test_quote.py 
        0     1              2       3                            4       5       6
0  201571  2080    "December 2   2022"    "November 1 - November 30   2022"  487.29
1  345741  5377    "December 3   2022"    "November 1 - November 30   2022"  729.35
2  995349  3672   "December 2    2022"   "November 1 - November 30    2022"  937.33
3  475601  3672   "December 2    2022"   "November 1 - November 30    2022"  790.17
4  228548  3672    "December 7   2022"    "November 1 - November 30   2022"  682.38

expected output
$ python3 test_quote.py 
        0     1                     2                                   3       4
0  201571  2080    "December 2, 2022"    "November 1 - November 30, 2022"  487.29
1  345741  5377    "December 3, 2022"    "November 1 - November 30, 2022"  729.35
2  995349  3672   "December 2 , 2022"   "November 1 - November 30 , 2022"  937.33
3  475601  3672   "December 2 , 2022"   "November 1 - November 30 , 2022"  790.17
4  228548  3672    "December 7, 2022"    "November 1 - November 30, 2022"  682.38

input file = test_quote.csv
201571, 2080, "December 2, 2022", "November 1 - November 30, 2022", 487.29
345741, 5377, "December 3, 2022", "November 1 - November 30, 2022", 729.35
995349, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 937.33
475601, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 790.17
228548, 3672, "December 7, 2022", "November 1 - November 30, 2022", 682.38

CodePudding user response:

The extra spaces after the commas are causing the issue. Use the following, but note most of your parameters are already the defaults.

import csv
import pandas 

df = pandas.read_csv( 'test_quote.csv', header=None, skipinitialspace=True)
print(df)

Output:

        0     1                  2                                3       4
0  201571  2080   December 2, 2022   November 1 - November 30, 2022  487.29
1  345741  5377   December 3, 2022   November 1 - November 30, 2022  729.35
2  995349  3672  December 2 , 2022  November 1 - November 30 , 2022  937.33
3  475601  3672  December 2 , 2022  November 1 - November 30 , 2022  790.17
4  228548  3672   December 7, 2022   November 1 - November 30, 2022  682.38
  • Related