remove new lines inside the column content in csv files-CodePudding

I have the following sample csv file:

'TEXT';'DATE'
'hello';'20/02/2002'
'hello!
how are you?';'21/02/2002'

So, as you can see, the separator between columns is ; and the content of each column is delimited by '. This brings me problems when processing the file with pandas, because it uses line breaks as a delimiter between rows. That is, it interprets the line break between "hello!" and "how are you" as a separator between rows.

So what I would need is to remove the newlines within the content of each column, so that the file looks like this:

'TEXT';'DATE'
'hello';'20/02/2002'
'hello! how are you?';'21/02/2002'

Removing the r'\n sequence would not work, because then I would lose the row separation. What can I try? I'm using Teradata SQL Assistant to generate the csv file.

CodePudding user response：

You can use sep= and quotechar= parameters in pd.read_csv:

df = pd.read_csv('your_file.csv', sep=';', quotechar="'")
print(df)

Prints:

                     TEXT        DATE
0                   hello  20/02/2002
1  hello!\r\nhow are you?  21/02/2002

If you want to further replace the newlines:

df['TEXT'] = df['TEXT'].str.replace('\r', '').str.replace('\n', ' ')
print(df)

Prints:

                  TEXT        DATE
0                hello  20/02/2002
1  hello! how are you?  21/02/2002