Sum across columns while performing a transformation on them-CodePudding

Let's say I have the following dataset

test_df = game_df = read.table(text = "a_bias b_bias c_bias d_bias
dog dog cat cat
NA NA NA NA
cat cat cat cat
dog NA dog dog", header = T)

I'd like to create a new column called dog_sum that is equal to the total number of times the dog appears across all the columns that end with the word bias. Here's my current approach:

test_df %>% rowwise() %>%
  mutate(dog_sum = sum(across(ends_with("bias"), ~ifelse(. == "dog", 1, 0)), na.rm = T))

The issue is that this throws an error when the entire row consists of NAs. Is there a better way of solving this?

More generally, I'd like to mutate new columns for every column matched by ends_with("bias"), and this new column should be set to 1 if it contains dog and 0 otherwise. So, the end result would be to have four columns named a_bias_dog, b_bias_dog, etc. that are set to 1 if "dog" is present for that specific row, and 0 otherwise. How do I achieve this?

CodePudding user response：

We may do this with rowSums as a vectorized approach and should be faster than rowwise/sum

library(dplyr)
test_df %>% 
   mutate(dog_sum = rowSums(across(ends_with('bias')) == 'dog',
       na.rm = TRUE))

-output

   a_bias b_bias c_bias d_bias dog_sum
1    dog    dog    cat    cat       2
2   <NA>   <NA>   <NA>   <NA>       0
3    cat    cat    cat    cat       0
4    dog   <NA>    dog    dog       3

If we want to generate new columns, use

test_df %>%
    mutate(across(ends_with('bias'), ~  (. %in% 'dog'), .names = "{.col}_dog"))

-output

   a_bias b_bias c_bias d_bias a_bias_dog b_bias_dog c_bias_dog d_bias_dog
1    dog    dog    cat    cat          1          1          0          0
2   <NA>   <NA>   <NA>   <NA>          0          0          0          0
3    cat    cat    cat    cat          0          0          0          0
4    dog   <NA>    dog    dog          1          0          1          1

CodePudding user response：

It is possible to apply a function in this way with rowwise(), you can use c_across. But when the function you want to apply rowwise is sum, you're better off using rowSums as in the other answers.

test_df = game_df = read.table(text = "a_bias b_bias c_bias d_bias
dog dog cat cat
NA NA NA NA
cat cat cat cat
dog NA dog dog", header = T)

library(dplyr, warn.conflicts = FALSE)

test_df %>% 
  rowwise() %>% 
  mutate(dog_sum = sum(c_across(ends_with('bias')) == 'dog', na.rm = TRUE))
#> # A tibble: 4 × 5
#> # Rowwise: 
#>   a_bias b_bias c_bias d_bias dog_sum
#>   <chr>  <chr>  <chr>  <chr>    <int>
#> 1 dog    dog    cat    cat          2
#> 2 <NA>   <NA>   <NA>   <NA>         0
#> 3 cat    cat    cat    cat          0
#> 4 dog    <NA>   dog    dog          3

^{Created on 2021-09-27 by the reprex package (v2.0.1)}

CodePudding user response：

We can use rowsums(condition). If your data is exactly like you showed, akruns answer will do the trick. In case you actually want to count columns that "contain" the character 'dog' as part of a more complex character, you can use str_detect, or, if you expect multiple 'dog' matches in some rows, we can use str_count.

test_df %>% mutate(dog_sum=rowSums(across(ends_with('bias'), ~str_detect('dog'))))

#OR

test_df %>% mutate(dog_sum=rowSums(across(ends_with('bias'), ~str_count('dog'))))

If you want to create a column that is either one or zero pending on having any dogs, you can use:

df %>% mutate(any_dogs=if_any(ends_with('bias'), ~str_detect('dog')))