I have a column with different strings length that are separated by ",". I want to split each rows of this column to separate columns and fill the missing values with "NA", and for each string count the number of the frequency. Here is a samples example:
M <- data.frame(name = c("A", "B", "C"), mapped = c("X1, X3, X4", "X2, X4", "X2,X3, X4"))
name mapped
1 A X1, X3, X4
2 B X2, X4
3 C X2,X3, X4
I want to get the resulting data-frame like:
df <- data.frame(name = c("A","B", "C"), V1 = c("X1","NA", "NA"), V2 = c("NA", "X2","X2"), V3 = c("X3","NA", "X3"), V4 = c("X4","X4", "X4"))
name V1 V2 V3 V4
1 A X1 NA X3 X4
2 B NA X2 NA X4
3 C NA X2 X3 X4
Then count the number of X1, X2, X3 and X4 for each column of new data-frame.
Thank you!
CodePudding user response:
You could use separate_rows
and pivot_wider
:
library(tidyverse)
M %>%
separate_rows(mapped) %>%
pivot_wider(names_from = mapped, values_from = mapped) %>%
relocate(order(colnames(.)))
# A tibble: 3 x 5
name X1 X2 X3 X4
<chr> <chr> <chr> <chr> <chr>
1 A X1 NA X3 X4
2 B NA X2 NA X4
3 C NA X2 X3 X4
Then to count the number of values per column, use :
colSums(!is.na(M[,-1]))
# X1 X2 X3 X4
# 1 2 2 3
CodePudding user response:
Split on comma, unlist, then count:
table(unlist(strsplit(M$mapped, ",")))
# X1 X2 X3 X4
# 1 2 2 3