Home > Mobile >  In R, conditionally subset only those columns where the sum of specific rows is X
In R, conditionally subset only those columns where the sum of specific rows is X

Time:10-02

Background

I've got a dataframe df:

df <- data.frame(task = c("a","b","c", "d","e"),
                 rater_1 = c(1,0,1,0,0),
                 rater_2 = c(1,0,1,1,1),
                 rater_3 = c(1,0,0,0,0),
                 stringsAsFactors=FALSE)

> df
  task rater_1 rater_2 rater_3
1    a       1       1       1
2    b       0       0       0
3    c       1       1       0
4    d       0       1       0
5    e       0       1       0

Raters are given rating tasks about the quality of a product -- if the thing they're rating is of good quality, it gets a 1; if not, it gets a 0.

The problem

I want to be able to subset only those rows whose column-sums for rater_1, rater_2, and rater_3 equal a specific number. Put another way, I want to return only those rows where n raters (out of 3) marked "1" for their rating task.

A concrete example: if I were looking for any rows whose rater sums were 2, I'd get a little subsetted dataframe like this:

  task rater_1 rater_2 rater_3
     c       1       1       0

What I've tried

I'm fiddling with filter in dplyr:

df %>%
  filter(sum(rater_1, rater_2, rater_3) == 2)

[1] task    rater_1 rater_2 rater_3
<0 rows> (or 0-length row.names)

But it's not giving me what I want.

CodePudding user response:

You can use rowSums to get the sum of each row and then filter based on that. Since you want the sum of all the columns except the first, we apply it to the data frame accordingly:

df[rowSums(df[-1])==2,]
#  task rater_1 rater_2 rater_3
#3    c       1       1       0

Equivalent if, instead of df[-1], we use df[,2:4].

  • Related