Home > Blockchain >  collapse into single row without concatenation
collapse into single row without concatenation

Time:10-18

I have DF as follows:

                 df <- structure(list(RID = c(1L, 1L, 2L, 2L, 3L, 3L), 
                 Sex = c("FEMALE", "FEMALE", "MALE", "MALE", "FEMALE", "FEMALE"),
                 Race = c("White","White", "Hispanic", "Hispanic", "Black", "Black"),
                 TIME = c("Break Fast", "Break Fast", "Lunch", "Lunch", "Dinner", "Dinner"),
                 Sugar = c("Normal", "Normal", "Abnormal", "Abnormal", "Satisfactory", 
                 "Satisfactory"), 
                 Test_A = c(90L,"","" , 157L,"" , 129L),
                 Test_B = c("",90L , 157L,"", 129L,"" )),
                 class = "data.frame", row.names = c(NA, -6L))

The required output is:

                 Requd_df <- structure(list(RID = c(1L, 2L,3L), 
                 Sex = c("FEMALE", "MALE", "FEMALE"),
                 Race = c("White", "Hispanic","Black"),
                 TIME = c("Break Fast",  "Lunch",   "Dinner"),
                 Sugar = c("Normal",  "Abnormal",  "Satisfactory"), 
                 Test_A = c(90L, 157L, 129L),
                 Test_B = c(90L , 157L, 129L)),
                 class = "data.frame", row.names = c(NA, -3L))

My code is as follows:

                 setDT(df)

                 df1 <-  df[, lapply(.SD, paste0, collapse=""), by= RID]

My code is concatenating every element of columns - RID, Sex,Race,Time,Sugar. Need to collapse without concatenation Please help

CodePudding user response:

Include other variables in by -

library(data.table)

setDT(df)
df[, lapply(.SD, paste0, collapse=""), .(RID, Sex, Race, TIME, Sugar)]

#   RID    Sex     Race       TIME        Sugar Test_A Test_B
#1:   1 FEMALE    White Break Fast       Normal     90     90
#2:   2   MALE Hispanic      Lunch     Abnormal    157    157
#3:   3 FEMALE    Black     Dinner Satisfactory    129    129

CodePudding user response:

We could do this in tidyverse

library(dplyr)
library(stringr)
df %>% 
   group_by(across(RID:Sugar)) %>% 
   summarise(across(everything(), str_c, collapse=""), .groups = 'drop')
# A tibble: 3 × 7
    RID Sex    Race     TIME       Sugar        Test_A Test_B
  <int> <chr>  <chr>    <chr>      <chr>        <chr>  <chr> 
1     1 FEMALE White    Break Fast Normal       90     90    
2     2 MALE   Hispanic Lunch      Abnormal     157    157   
3     3 FEMALE Black    Dinner     Satisfactory 129    129  
  • Related