Home > Enterprise >  Pandas: ignore spaces in last column of csv when converting to dataframe
Pandas: ignore spaces in last column of csv when converting to dataframe

Time:11-10

I want to convert a csv file to a Pandas dateframe. The csv file has the following format (first line is header):

id,link,domain,subdomain,subsubdomain,difficulty,solved_date,instruction
0,https://www.example.com,Practice,Tutorials,dummyText,Easy,2021-11-01 14:18:51,'instructions\Day 0 - Hello, World.pdf'

When I convert it into a dataframe using df = pandas.read_csv(csv_filename, index_col=0, delimiter=','), I get (some columns ommitted because they are not relevant): | | id | link | domain | ... | solved_date | instruction | | -- | -- | ---- | ------ | --- | ----------- | ----------- | | 0 | https://www.example.com | Practice | ... | 2021-11-01 14:18:51 | 'instructions\Day 0 - Hello | World.pdf | 0 | https://www.example.com | Practice | ... | 2021-11-01 14:18:51 | 'instructions\Day 1 - Datatypes.pdf

It fails to convert lines with additional whitespaces correctly (the filenames are in apostrophes ('). Additionally, the column names are off by one. I tried ommiting the index_col option but nothing changed.

Does someone know a solution to this problem?

CodePudding user response:

You just have to declare the quoting character to be a simple quote, because by default it is supposed to be a double quote:

df = pandas.read_csv(csv_filename, index_col=0, delimiter=',', quotechar="'")
  • Related