Home > Software design >  How to read csv. files with vectors in data.table package?
How to read csv. files with vectors in data.table package?

Time:11-23

I created a tibble (named df) with a number and a vector inside:

library(tsibble)
library(data.table)

df <- tibble(var1 = 5, var2 = list(c(1,2,3)))

var1   var2
 5   c(1,2,3)

Then I saved this tibble as a csv. file like so:

data.table::fwrite(df, file = "C/MyFolder/file.csv")

Now I want to read this file:

df <- data.table::fread(file = "C/MyFolder/file.csv")

And I get new tibble with number and text inside a cell:

 var1   var2
   5    1|2|3

How to correctly read a csv. file in order to get again a tibble with a vector inside a cell?

CodePudding user response:

You might not be able to do it in one fell swoop, but here's a custom function that will solve your problem.

Custom Function

The function str_as_vct() is defined as follows:

str_as_vct <- function(x, sep = "|", transform = as.numeric, ...) {
  sapply(
    X = base::strsplit(
      x = x,
      split = sep,
      fixed = TRUE
    ),
    FUN = transform,
    ... = ...,
    simplify = FALSE,
    USE.NAMES = FALSE
  )
}

Description

Take a vector of character strings, each with values separated by a delimiter, and split each string into a vector of its values.

Usage

x: A vector of character strings, which represent vectors as delimited values.

sep: A character string. The delimiter used by the strings in x.

transform: A function to transform character vectors into vectors of the desired datatype.

...: Further arguments to the transform function.

Solution

Armed with str_as_vct(), your problem can be solved in a single assignment:

df <- data.table::fread(file = "C/MyFolder/file.csv")[
  # Select all rows.
  ,
  
  # Select and transform columns.
  .(var1, var2 = str_as_vct(var2))
]

Result

Given an initial df like this

df <- tibble(
  var1 = 1:3,
  var2 = list(
    c(1, 2, 3),
    c(4, 5, 6),
    c(7, 8, 9)
  )
)

the solution should yield a data.table with the following str()

Classes ‘data.table’ and 'data.frame':  3 obs. of  2 variables:
 $ var1: int  1 2 3
 $ var2:List of 3
  ..$ : num  1 2 3
  ..$ : num  4 5 6
  ..$ : num  7 8 9
 - attr(*, ".internal.selfref")=<externalptr> 

where each element of var2 is a numeric vector.

Conversion to a tibble via as_tibble(df) will yield:

# A tibble: 3 x 2
   var1 var2     
  <int> <list>   
1     1 <dbl [3]>
2     2 <dbl [3]>
3     3 <dbl [3]>

CodePudding user response:

It seems that your csv file is seperated by | so you need in fread the seperator argument like:

fread(file = "file.csv", sep="|")

Greetings

  • Related