Home > Blockchain >  How can I separate a row character vector into unknown number of columns in R using dplyr?
How can I separate a row character vector into unknown number of columns in R using dplyr?

Time:10-28

I have a data frame that looks like this:

date var
2022-01-01 a,b,...,h
2022-01-02 a,b,...,z

Now I want to use separate function in R from dplyr or any other function that separates all the characters with criterion the "," but I don't know how many unique characters are in each cell to create unknown number of columns. There will be unbalanced columns filled with NA.

Ideally I want the reported data frame to look like this :

date var1 var2 var... var... var_Inf
2022-01-01 a b ... h NA
2022-01-02 a b ... ... z

how can I do this in R ?

date = seq(as.Date("2022/1/1"), as.Date("2022/1/2"), by = "day")
date
var = c("a,b,...,h","a,b,...,z");var
df = tibble(date,var)


a more reproducible example is this :

var = c("a,b,c,d,e,h","a,b,c,d,e,f,i,z");var
df = tibble(date,var)

but consider that I don't know the number of letters on each column element.

How can I do this in R ?

CodePudding user response:

We could do it this way: Bringing in long format with separate_rows() from tidyr package makes it easier to handle such tasks:

library(dplyr)
library(tidyr)

df %>% 
  separate_rows(var) %>% 
  group_by(date) %>% 
  mutate(id = row_number()) %>% 
  pivot_wider(names_from = id, values_from = var, names_glue = "var{id}")

  date       var1  var2  var3  var4 
  <chr>      <chr> <chr> <chr> <chr>
1 2022-01-01 a     b     x     h    
2 2022-01-02 a     b     x     z 

data:

df <- structure(list(date = c("2022-01-01", "2022-01-02"), var = c("a,b,x,h", 
"a,b,x,z")), class = "data.frame", row.names = c(NA, -2L))
  • Related