Home > Back-end >  fwrite and fread empty strings
fwrite and fread empty strings

Time:07-25

How can I make fread read "" as an empty string (i.e. without manually specifying colClasses)? What justifies this fread/fwrite incompatibility and how to avoid it (i.e. is there a way to fwrite so that fread can read empty strings)?

library(data.table)
dt <- data.table(a = 1, b = '')
fwrite(dt, file = 'out.csv')
dt2 <- fread('out.csv')

dt
#    a b
# 1: 1  
dt2
#    a  b
# 1: 1 NA

There are couple of closely related posts (e.g. this one, but it also asks for a trouble by specifying a numeric string). I think my case is much simpler yet more surprising/insidious given somewhat justified expectation that fwrite and fread should be able to handle the same data consistently.

CodePudding user response:

This is a combination of both a previous question and a current bug in data.table#3439 where the empty strings are not recognized correctly.

Two ways to resolve this issue, and perhaps they will not be necessary when #3439 is fixed:

  1. ensure that other rows in the b column are non-empty strings; you may not have control over this, so ...

    writeLines(c("a,b", '1,""', '2,"b"'), "out.csv")
    fread("out.csv")
    #        a      b
    #    <int> <char>
    # 1:     1       
    # 2:     2      b
    
  2. ensure that the b column is class character:

    ### with the original 1-row CSV file
    fread("out.csv", colClasses=c(b="character"))
    #        a      b
    #    <int> <char>
    # 1:     1       
    
  3. When #3439 is resolved, you should be able to just do

    fread("out.csv", na.string=NULL)
    
  • Related