So, I have a .txt file and I want to read it in pandas. The line is this when I open in Notepad :
"1013764";"Test INT"12345678"";"TEST";"TEST";""
Then, to open in pandas, I do this:
data = pd.read_csv("TestFile.TXT", sep=";")
When I print "data", it appears like this:
Any solution for the quotation mark not to disappear?
CodePudding user response:
You need to remove the quotation marks by replacing them. Let's say that the column name is col1, then:
df['col1'] = df['col1'].str.replace('/"', '')
CodePudding user response:
The simplest solution I could find is
import csv
import pandas
data = pd.read_csv("<youre file>", sed=";", quoting=csv.QUOTE_NONE)
(In your case the code above will produce this:
Columns: ["1013764", "Test INT"12345678"", "TEST", "TEST", ""]
)
The Problem with this is that read_csv will parse everything as a string. I would advise you (If you want to preserve the "datatypes") to use diffrent quotes in your csv (like ') to signal to pandas that the data is a string. This can be done by adding the quotechar parameter!
import pandas
data = pd.read_csv("<youre file>", sed=";", quotechar="'")
More information about the read_csv can be found in the pandas docs: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html