Consider the following data named df
:
id v1 v2
1 3 3,4,5,3,4,5
2 2 4,5,4
3 5 5,5,5
4 3 3,3,4
5 4 4,4,6,7
Now, I want to create new data set with new columns using loop
. The column names are derived from the last column (v2
). That means it starts from v3
(the smallest value in v2
) and ends at v7
(the largest value in v2
). The value of each cell of newly created columns is the number of times that number is repeated in v2
. My new data set looks like the following:
id v1 v3 v4 v5 v6 v7
1 3 2 2 2 0 0
2 2 0 2 1 0 0
3 5 0 0 3 0 0
4 3 2 1 0 0 0
5 4 0 2 0 1 1
Any help is highly appreciated!
CodePudding user response:
Assuming the 'v2' column is character
class, split the column by ,
and use qdapTools::mtabulate
to get the frequencies
library(qdapTools)
df[paste0("v", 3:7)] <- mtabulate(strsplit(df$v2, ","))
df$v2 <- NULL
-output
> df
id v1 v3 v4 v5 v6 v7
1 1 3 2 2 2 0 0
2 2 2 0 2 1 0 0
3 3 5 0 0 3 0 0
4 4 3 2 1 0 0 0
5 5 4 0 2 0 1 1
data
df <- structure(list(id = 1:5, v1 = c(3L, 2L, 5L, 3L, 4L), v2 = c("3,4,5,3,4,5",
"4,5,4", "5,5,5", "3,3,4", "4,4,6,7")), class = "data.frame",
row.names = c(NA,
-5L))