I have a dataset like this.
ID Col1 Col2 Col3 Col4 Score
1 1 1 1 1 30.14
2 1 1 1 1 30.26
3 1 1 1 1 30.14
4 1 1 1 1 30.14
5 1 3 3 3 38.78
6 1 3 3 3 38.78
7 1 1 1 2 38.34
8 1 1 1 2 38.34
I like to create a summary of data patterns like this.
Col1 Col2 Col3 Col4 Score
1 1 1 1 30.14, 30.26
1 3 3 3 38.78
1 1 1 2 38.34
I am not sure how to generate such a pattern. I tried the md.pattern function from mice but it does not give a collapsed column of Scores. Doing a unique also will not work. Any suggestion on how to create this summary is much apricated.
CodePudding user response:
library(dplyr)
data.frame(
ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L),
Col1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
Col2 = c(1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L),
Col3 = c(1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L),
Col4 = c(1L, 1L, 1L, 1L, 3L, 3L, 2L, 2L),
Score = c(30.14, 30.26, 30.14, 30.14, 38.78, 38.78, 38.34, 38.34)
) %>%
distinct(Col1, Col2, Col3, Col4, Score) %>%
group_by(Col1, Col2, Col3, Col4) %>%
summarize(Score = paste(Score, collapse = ", "), .groups = "drop")
result (ordered by the Col values instead of orig order of appearance)
# A tibble: 3 × 5
Col1 Col2 Col3 Col4 Score
<int> <int> <int> <int> <chr>
1 1 1 1 1 30.14, 30.26
2 1 1 1 2 38.34
3 1 3 3 3 38.78