Home > Back-end >  How to use pandas read_csv to read csv file having backward slash and double quotation
How to use pandas read_csv to read csv file having backward slash and double quotation

Time:04-26

I have a CSV file like this (comma separated)

ID, Name,Context, Location
123,"John","{\"Organization\":{\"Id\":12345,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}","Road 1"
234,"Mike","{\"Organization\":{\"Id\":23456,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}","Road 2"

I want to create DataFrame like this:

ID | Name |Context                                                               |Location
123| John |{\"Organization\":{\"Id\":12345,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}|Road 1
234| Mike |{\"Organization\":{\"Id\":23456,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}|Road 2

Could you help and show me how to use pandas read_csv doing it?

CodePudding user response:

An answer - if you are willing to accept that the \ char gets stripped:

pd.read_csv(your_filepath, escapechar='\\')

    ID  Name                                            Context  Location
0  123  John  {"Organization":{"Id":12345,"IsDefault":false}...    Road 1
1  234  Mike  {"Organization":{"Id":23456,"IsDefault":false}...    Road 2

An answer if you actually want the backslashes in - using a custom converter:

def backslash_it(x):
    return x.replace('"','\\"')

pd.read_csv(your_filepath, escapechar='\\', converters={'Context': backslash_it})

    ID  Name                                            Context Location
0  123  John  {\"Organization\":{\"Id\":12345,\"IsDefault\":...   Road 1
1  234  Mike  {\"Organization\":{\"Id\":23456,\"IsDefault\":...   Road 2

escapechar on read_csv is used to actually read the csv then the custom converter puts the backslashes back in.

Note that I tweaked the header row to make the column name match easier:

ID,Name,Context,Location
  • Related