How can I make fread
read ""
as an empty string (i.e. without manually specifying colClasses
)? What justifies this fread/fwrite
incompatibility and how to avoid it (i.e. is there a way to fwrite
so that fread
can read empty strings)?
library(data.table)
dt <- data.table(a = 1, b = '')
fwrite(dt, file = 'out.csv')
dt2 <- fread('out.csv')
dt
# a b
# 1: 1
dt2
# a b
# 1: 1 NA
There are couple of closely related posts (e.g. this one, but it also asks for a trouble by specifying a numeric string). I think my case is much simpler yet more surprising/insidious given somewhat justified expectation that fwrite
and fread
should be able to handle the same data consistently.
CodePudding user response:
This is a combination of both a previous question and a current bug in data.table#3439 where the empty strings are not recognized correctly.
Two ways to resolve this issue, and perhaps they will not be necessary when #3439 is fixed:
ensure that other rows in the
b
column are non-empty strings; you may not have control over this, so ...writeLines(c("a,b", '1,""', '2,"b"'), "out.csv") fread("out.csv") # a b # <int> <char> # 1: 1 # 2: 2 b
ensure that the
b
column is classcharacter
:### with the original 1-row CSV file fread("out.csv", colClasses=c(b="character")) # a b # <int> <char> # 1: 1
When #3439 is resolved, you should be able to just do
fread("out.csv", na.string=NULL)