I'm working with CSV files and the problem is that some rows have columns containing \"
inside. A simple example would be:
"Row 42"; "Some value"; "Description: \"xyz\""; "Anoher value"
As you can see, the third column contains that combination and when I use the read_csv
method in R, the input format is messed up. One working solution is to open the CSV file in Notepad and simply replace \"
with '
, for example. However, I'd prefer to have this automated.
I'm able to replace the \"
with '
by using
gsub('\\\\"', "\\\'", df)
However, I'm not able to write it in the original format. Whenever I read the CSV file with R, I lose the quotation marks indicating the columns. So, in other words, my current method outputs the following:
"Row 42; Some value; Description: 'xyz'; Anoher value"
The quotation marks before and after ;
are missing.
It's almost fine, but when opening the preprocessed file with Excel, it doesn't recongize the columns. I think the most convenient solution would be to read the CSV file simply as one big string containing all the quotation marks, replacing the desired combination explained above and then write it again. However, I'm not able to read the file as one big string containing all the quotation marks.
Is there a way to read the CSV file with R containing all the quotation marks? Do you have any other solutions to achieve that?
CodePudding user response:
Already tried read.table
? It comes with the base installation of R.
Define sep=';'
as the separator and use nothing as quotes, quotes=''
. Then gsub
the redundant quotes away and do trimws
. This should fix your data.
x <- '"Row 42"; "Some value;" "Description: \"xyz\""; "Anoher value"'
tab <- read.table(text=x, sep=';', quote='')
tab[] <- lapply(tab, \(x) trimws(gsub(x, pat='\\"', rep='')))
tab
# V1 V2 V3 V4
# 1 Row 42 Some value Description: xyz Anoher value
In your case use read.table(file='<path to .csv file>', sep=';', quote='')
CodePudding user response:
I found the solution, if anyone else faces the same problem:
data <- read_lines(inputFileName)
preprocessed <- gsub('\\\\"', "\\\'", data)
write_lines(preprocessed, outputFileName)