I want to convert a csv file to a Pandas dateframe. The csv file has the following format (first line is header):
id,link,domain,subdomain,subsubdomain,difficulty,solved_date,instruction
0,https://www.example.com,Practice,Tutorials,dummyText,Easy,2021-11-01 14:18:51,'instructions\Day 0 - Hello, World.pdf'
When I convert it into a dataframe using df = pandas.read_csv(csv_filename, index_col=0, delimiter=',')
, I get (some columns ommitted because they are not relevant):
| | id | link | domain | ... | solved_date | instruction |
| -- | -- | ---- | ------ | --- | ----------- | ----------- |
| 0 | https://www.example.com | Practice | ... | 2021-11-01 14:18:51 | 'instructions\Day 0 - Hello | World.pdf
| 0 | https://www.example.com | Practice | ... | 2021-11-01 14:18:51 | 'instructions\Day 1 - Datatypes.pdf
It fails to convert lines with additional whitespaces correctly (the filenames are in apostrophes ('). Additionally, the column names are off by one. I tried ommiting the index_col option but nothing changed.
Does someone know a solution to this problem?
CodePudding user response:
You just have to declare the quoting character to be a simple quote, because by default it is supposed to be a double quote:
df = pandas.read_csv(csv_filename, index_col=0, delimiter=',', quotechar="'")