I created a tibble (named df) with a number and a vector inside:
library(tsibble)
library(data.table)
df <- tibble(var1 = 5, var2 = list(c(1,2,3)))
var1 var2
5 c(1,2,3)
Then I saved this tibble as a csv. file like so:
data.table::fwrite(df, file = "C/MyFolder/file.csv")
Now I want to read this file:
df <- data.table::fread(file = "C/MyFolder/file.csv")
And I get new tibble with number and text inside a cell:
var1 var2
5 1|2|3
How to correctly read a csv. file in order to get again a tibble with a vector inside a cell?
CodePudding user response:
You might not be able to do it in one fell swoop, but here's a custom function that will solve your problem.
Custom Function
The function str_as_vct()
is defined as follows:
str_as_vct <- function(x, sep = "|", transform = as.numeric, ...) {
sapply(
X = base::strsplit(
x = x,
split = sep,
fixed = TRUE
),
FUN = transform,
... = ...,
simplify = FALSE,
USE.NAMES = FALSE
)
}
Description
Take a vector of character
strings, each with values separated by a delimiter, and split each string into a vector of its values.
Usage
x
: A vector of character
strings, which represent vectors as delimited values.
sep
: A character
string. The delimiter used by the strings in x
.
transform
: A function to transform character
vectors into vectors of the desired datatype.
...
: Further arguments to the transform
function.
Solution
Armed with str_as_vct()
, your problem can be solved in a single assignment:
df <- data.table::fread(file = "C/MyFolder/file.csv")[
# Select all rows.
,
# Select and transform columns.
.(var1, var2 = str_as_vct(var2))
]
Result
Given an initial df
like this
df <- tibble(
var1 = 1:3,
var2 = list(
c(1, 2, 3),
c(4, 5, 6),
c(7, 8, 9)
)
)
the solution should yield a data.table
with the following str()
Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
$ var1: int 1 2 3
$ var2:List of 3
..$ : num 1 2 3
..$ : num 4 5 6
..$ : num 7 8 9
- attr(*, ".internal.selfref")=<externalptr>
where each element of var2
is a numeric
vector.
Conversion to a tibble
via as_tibble(df)
will yield:
# A tibble: 3 x 2
var1 var2
<int> <list>
1 1 <dbl [3]>
2 2 <dbl [3]>
3 3 <dbl [3]>
CodePudding user response:
It seems that your csv
file is seperated by |
so you need in fread
the seperator argument like:
fread(file = "file.csv", sep="|")
Greetings