Home > Software engineering >  Select row in a group based on priority of column values in data.table
Select row in a group based on priority of column values in data.table

Time:02-25

I have a data.table as follows -

temp_dt = structure(list(group = c("A", "A", "B", "C", "D", 
"D", "E", "E"), value = c(28.395, 26.206, 64.032, 
7.588961, 0.053089, 0.053089, 0.795798, 0.795798), type = c("R", 
"P", "R", "R", "R", "P", "R", "P")), row.names = c(NA, -8L), class = c("data.table", 
"data.frame"))

> temp_dt
   group     value type
1:     A 28.395000    R
2:     A 26.206000    P
3:     B 64.032000    R
4:     C  7.588961    R
5:     D  0.053089    R
6:     D  0.053089    P
7:     E  0.795798    R
8:     E  0.795798    P

I want to subset the data.table temp_dt such that when a group has both types R and P, then row with type R is selected. If the group has either R or P, then whatever is available is selected.

CodePudding user response:

A possible solution:

library(data.table)
temp_dt[order(-type),.SD[1,],by=group]

    group     value   type
   <char>     <num> <char>
1:      A 26.206000      R
2:      B 64.032000      R
3:      C  7.588961      R
4:      D  0.053089      R
5:      E  0.795798      R

CodePudding user response:

temp_11 <- table(temp_dt[ ,-2])
temp_11 <- as.data.table(temp_11)

temp_12 <- temp_11 %>%
  group_by(group)%>%
  summarise(Select = sum(N))

temp_12$Select <- as.character(temp_12$Select)

for (i in 1:nrow(temp_12)){
  if((temp_12[i , 2] == 2)){
    temp_12[i , 2] = "R"
  }else{
    temp_12[i , 2] = "P"
  }
}

temp_12

It will give this:

group Select
<chr> <chr> 
1 A     R     
2 B     P     
3 C     P     
4 D     R     
5 E     R     
  • Related