Let's say I have the following dataset
test_df = game_df = read.table(text = "a_bias b_bias c_bias d_bias
dog dog cat cat
NA NA NA NA
cat cat cat cat
dog NA dog dog", header = T)
I'd like to create a new column called dog_sum
that is equal to the total number of times the dog
appears across all the columns that end with the word bias
. Here's my current approach:
test_df %>% rowwise() %>%
mutate(dog_sum = sum(across(ends_with("bias"), ~ifelse(. == "dog", 1, 0)), na.rm = T))
The issue is that this throws an error when the entire row consists of NAs
. Is there a better way of solving this?
More generally, I'd like to mutate new columns for every column matched by ends_with("bias")
, and this new column should be set to 1 if it contains dog
and 0 otherwise. So, the end result would be to have four columns named a_bias_dog
, b_bias_dog
, etc. that are set to 1 if "dog" is present for that specific row, and 0 otherwise. How do I achieve this?
CodePudding user response:
We may do this with rowSums
as a vectorized approach and should be faster than rowwise/sum
library(dplyr)
test_df %>%
mutate(dog_sum = rowSums(across(ends_with('bias')) == 'dog',
na.rm = TRUE))
-output
a_bias b_bias c_bias d_bias dog_sum
1 dog dog cat cat 2
2 <NA> <NA> <NA> <NA> 0
3 cat cat cat cat 0
4 dog <NA> dog dog 3
If we want to generate new columns, use
test_df %>%
mutate(across(ends_with('bias'), ~ (. %in% 'dog'), .names = "{.col}_dog"))
-output
a_bias b_bias c_bias d_bias a_bias_dog b_bias_dog c_bias_dog d_bias_dog
1 dog dog cat cat 1 1 0 0
2 <NA> <NA> <NA> <NA> 0 0 0 0
3 cat cat cat cat 0 0 0 0
4 dog <NA> dog dog 1 0 1 1
CodePudding user response:
It is possible to apply a function in this way with rowwise()
, you can use c_across
. But when the function you want to apply rowwise is sum
, you're better off using rowSums
as in the other answers.
test_df = game_df = read.table(text = "a_bias b_bias c_bias d_bias
dog dog cat cat
NA NA NA NA
cat cat cat cat
dog NA dog dog", header = T)
library(dplyr, warn.conflicts = FALSE)
test_df %>%
rowwise() %>%
mutate(dog_sum = sum(c_across(ends_with('bias')) == 'dog', na.rm = TRUE))
#> # A tibble: 4 × 5
#> # Rowwise:
#> a_bias b_bias c_bias d_bias dog_sum
#> <chr> <chr> <chr> <chr> <int>
#> 1 dog dog cat cat 2
#> 2 <NA> <NA> <NA> <NA> 0
#> 3 cat cat cat cat 0
#> 4 dog <NA> dog dog 3
Created on 2021-09-27 by the reprex package (v2.0.1)
CodePudding user response:
We can use rowsums(condition). If your data is exactly like you showed, akruns answer will do the trick. In case you actually want to count columns that "contain" the character 'dog' as part of a more complex character, you can use str_detect
, or, if you expect multiple 'dog' matches in some rows, we can use str_count
.
test_df %>% mutate(dog_sum=rowSums(across(ends_with('bias'), ~str_detect('dog'))))
#OR
test_df %>% mutate(dog_sum=rowSums(across(ends_with('bias'), ~str_count('dog'))))
If you want to create a column that is either one or zero pending on having any
dogs, you can use:
df %>% mutate(any_dogs=if_any(ends_with('bias'), ~str_detect('dog')))