I have the following sample csv file:
'TEXT';'DATE'
'hello';'20/02/2002'
'hello!
how are you?';'21/02/2002'
So, as you can see, the separator between columns is ;
and the content of each column is delimited by '
. This brings me problems when processing the file with pandas, because it uses line breaks as a delimiter between rows. That is, it interprets the line break between "hello!" and "how are you" as a separator between rows.
So what I would need is to remove the newlines within the content of each column, so that the file looks like this:
'TEXT';'DATE'
'hello';'20/02/2002'
'hello! how are you?';'21/02/2002'
Removing the r'\n
sequence would not work, because then I would lose the row separation.
What can I try? I'm using Teradata SQL Assistant to generate the csv file.
CodePudding user response:
You can use sep=
and quotechar=
parameters in pd.read_csv
:
df = pd.read_csv('your_file.csv', sep=';', quotechar="'")
print(df)
Prints:
TEXT DATE
0 hello 20/02/2002
1 hello!\r\nhow are you? 21/02/2002
If you want to further replace the newlines:
df['TEXT'] = df['TEXT'].str.replace('\r', '').str.replace('\n', ' ')
print(df)
Prints:
TEXT DATE
0 hello 20/02/2002
1 hello! how are you? 21/02/2002