I have a data frame that looks like this:
date | var |
---|---|
2022-01-01 | a,b,...,h |
2022-01-02 | a,b,...,z |
Now I want to use separate function in R from dplyr or any other function that separates all the characters with criterion the "," but I don't know how many unique characters are in each cell to create unknown number of columns. There will be unbalanced columns filled with NA.
Ideally I want the reported data frame to look like this :
date | var1 | var2 | var... | var... | var_Inf |
---|---|---|---|---|---|
2022-01-01 | a | b | ... | h | NA |
2022-01-02 | a | b | ... | ... | z |
how can I do this in R ?
date = seq(as.Date("2022/1/1"), as.Date("2022/1/2"), by = "day")
date
var = c("a,b,...,h","a,b,...,z");var
df = tibble(date,var)
a more reproducible example is this :
var = c("a,b,c,d,e,h","a,b,c,d,e,f,i,z");var
df = tibble(date,var)
but consider that I don't know the number of letters on each column element.
How can I do this in R ?
CodePudding user response:
We could do it this way:
Bringing in long format with separate_rows()
from tidyr
package makes it easier to handle such tasks:
library(dplyr)
library(tidyr)
df %>%
separate_rows(var) %>%
group_by(date) %>%
mutate(id = row_number()) %>%
pivot_wider(names_from = id, values_from = var, names_glue = "var{id}")
date var1 var2 var3 var4
<chr> <chr> <chr> <chr> <chr>
1 2022-01-01 a b x h
2 2022-01-02 a b x z
data:
df <- structure(list(date = c("2022-01-01", "2022-01-02"), var = c("a,b,x,h",
"a,b,x,z")), class = "data.frame", row.names = c(NA, -2L))