split the string in the rows to separate columns in R-CodePudding

I have a column with different strings length that are separated by ",". I want to split each rows of this column to separate columns and fill the missing values with "NA", and for each string count the number of the frequency. Here is a samples example:

M <- data.frame(name = c("A", "B", "C"), mapped = c("X1, X3, X4", "X2, X4", "X2,X3, X4"))
  name     mapped
1    A X1, X3, X4
2    B     X2, X4
3    C  X2,X3, X4

I want to get the resulting data-frame like:

df <- data.frame(name = c("A","B", "C"), V1 = c("X1","NA", "NA"), V2 = c("NA", "X2","X2"), V3 = c("X3","NA", "X3"), V4 = c("X4","X4", "X4"))

  name V1 V2 V3 V4
1    A X1 NA X3 X4
2    B NA X2 NA X4
3    C NA X2 X3 X4

Then count the number of X1, X2, X3 and X4 for each column of new data-frame.

Thank you!

CodePudding user response：

You could use separate_rows and pivot_wider:

library(tidyverse)

M %>% 
  separate_rows(mapped) %>% 
  pivot_wider(names_from = mapped, values_from = mapped) %>% 
  relocate(order(colnames(.)))

# A tibble: 3 x 5
  name  X1    X2    X3    X4   
  <chr> <chr> <chr> <chr> <chr>
1 A     X1    NA    X3    X4   
2 B     NA    X2    NA    X4   
3 C     NA    X2    X3    X4

Then to count the number of values per column, use :

colSums(!is.na(M[,-1]))
# X1 X2 X3 X4 
#  1  2  2  3

CodePudding user response：

Split on comma, unlist, then count:

table(unlist(strsplit(M$mapped, ",")))
# X1 X2 X3 X4 
#  1  2  2  3