Home > Blockchain >  Filter Data frame if at least 3 columns agrees the condition
Filter Data frame if at least 3 columns agrees the condition

Time:06-24

I have a data frame as this

df <- data.frame(student_name = c('U','V','X','Y','Z'),
             grade = c('AA','CC','DD','AB','BB'),
             math_marks = c(40,80,38,97,65),
             eng_marks = c(95,78,36,41,25),
             sci_marks = c(56,25,36,87,15),
             Point_A=c(1,1,1,1,NA),
             Point_B=c(NA,1,NA,1,1),
             Point_C=c(NA,1,NA,NA,NA),
             Point_D=c(NA,NA,NA,NA,1),
             Point_E=c(NA,1,NA,NA,1))

I need add a column called "Point" based on the column values Point_A to Point_E, if any 3 column value equals 1.

Excepted output.

df <- data.frame(student_name = c('U','V','X','Y','Z'),
             grade = c('AA','CC','DD','AB','BB'),
             math_marks = c(40,80,38,97,65),
             eng_marks = c(95,78,36,41,25),
             sci_marks = c(56,25,36,87,15),
             Point_A=c(1,1,1,1,NA),
             Point_B=c(NA,1,NA,1,1),
             Point_C=c(NA,1,NA,NA,NA),
             Point_D=c(NA,NA,NA,NA,1),
             Point_E=c(NA,1,NA,NA,1),
             Point=c(NA,1,NA,NA,1))

So far I was doing with this for all possible 3 combinations

df%>% filter(Point_A ==1,Point_B==1,Point_C==1)

Is there any other way to do this ?

CodePudding user response:

To subset down to the rows with that condition use rowSums with across to sum the 1's by row:

df %>% filter(rowSums(across(starts_with("Point")), na.rm = TRUE) >= 3)
##   student_name grade math_marks eng_marks sci_marks Point_A Point_B Point_C Point_D Point_E
## 1            V    CC         80        78        25       1       1       1      NA       1
## 2            Z    BB         65        25        15      NA       1      NA       1       1

or to add a 0/1 Point column indicating whether that row satisfies the condition:

df %>% mutate(Point =   (rowSums(across(starts_with("Point")), na.rm = TRUE) >= 3))
##   student_name grade math_marks eng_marks sci_marks Point_A Point_B Point_C Point_D Point_E Point
## 1            U    AA         40        95        56       1      NA      NA      NA      NA     0
## 2            V    CC         80        78        25       1       1       1      NA       1     1
## 3            X    DD         38        36        36       1      NA      NA      NA      NA     0
## 4            Y    AB         97        41        87       1       1      NA      NA      NA     0
## 5            Z    BB         65        25        15      NA       1      NA       1       1     1

CodePudding user response:

ifelse with rowwise option:

library(dplyr)
df %>%
  rowwise %>%
  mutate(Point = ifelse(rowSums(across(Point_A:Point_E, ~ .x == 1), na.rm = T)>=3, 1, NA))

Output:

# A tibble: 5 × 11
# Rowwise: 
  student_name grade math_marks eng_marks sci_marks Point_A Point_B Point_C Point_D Point_E Point
  <chr>        <chr>      <dbl>     <dbl>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 U            AA            40        95        56       1      NA      NA      NA      NA    NA
2 V            CC            80        78        25       1       1       1      NA       1     1
3 X            DD            38        36        36       1      NA      NA      NA      NA    NA
4 Y            AB            97        41        87       1       1      NA      NA      NA    NA
5 Z            BB            65        25        15      NA       1      NA       1       1     1
  •  Tags:  
  • r
  • Related