Home > Back-end >  r data pattern generation
r data pattern generation

Time:09-13

I have a dataset like this.

ID   Col1    Col2    Col3    Col4    Score
1    1       1       1       1       30.14
2    1       1       1       1       30.26
3    1       1       1       1       30.14
4    1       1       1       1       30.14
5    1       3       3       3       38.78
6    1       3       3       3       38.78
7    1       1       1       2       38.34
8    1       1       1       2       38.34

I like to create a summary of data patterns like this.

  Col1    Col2    Col3    Col4    Score
  1       1       1       1       30.14, 30.26
  1       3       3       3       38.78
  1       1       1       2       38.34

I am not sure how to generate such a pattern. I tried the md.pattern function from mice but it does not give a collapsed column of Scores. Doing a unique also will not work. Any suggestion on how to create this summary is much apricated.

CodePudding user response:

library(dplyr)
data.frame(
         ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L),
        Col1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
        Col2 = c(1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L),
        Col3 = c(1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L),
        Col4 = c(1L, 1L, 1L, 1L, 3L, 3L, 2L, 2L),
       Score = c(30.14, 30.26, 30.14, 30.14, 38.78, 38.78, 38.34, 38.34)
) %>%
  distinct(Col1, Col2, Col3, Col4, Score) %>%
  group_by(Col1, Col2, Col3, Col4) %>%
  summarize(Score = paste(Score, collapse = ", "), .groups = "drop")

result (ordered by the Col values instead of orig order of appearance)

# A tibble: 3 × 5
   Col1  Col2  Col3  Col4 Score       
  <int> <int> <int> <int> <chr>       
1     1     1     1     1 30.14, 30.26
2     1     1     1     2 38.34       
3     1     3     3     3 38.78     
  • Related