Home > Blockchain >  Rowwise sum is not working in dplyr for R
Rowwise sum is not working in dplyr for R

Time:09-30

Dataset

I have simulated this dataset for my question:

#### Set Seed ####
set.seed(123)

#### Create Data Frame ####
df <- data.frame(x1 = rbinom(n=100,
                              size=1,
                              prob = .5),
                 x2 = rbinom(n=100,
                              size=1,
                              prob = .5),
                 x3 = rbinom(n=100,
                              size=1,
                              prob = .5))

#### Convert to Tibble ####
tibble <- df %>% 
  as_tibble()

Problem

When I run this rowwise summary of the X values:

#### Summarize Rowwise Values ####
tibble %>% 
  rowwise() %>% 
  mutate(Sum.X = sum(.,
                     na.rm = T))

I get this summary, which is not what I'm looking for. This appears to be a summary of something else:

# A tibble: 100 × 4
# Rowwise: 
      x1    x2    x3 Sum.X
   <int> <int> <int> <int>
 1     0     1     0   145
 2     1     0     1   145
 3     0     0     1   145
 4     1     1     1   145
 5     1     0     0   145
 6     0     1     1   145
 7     1     1     0   145
 8     1     1     0   145
 9     1     0     0   145
10     0     0     0   145
# … with 90 more rows

However, I'm looking for a summary by row that looks something like this:

# A tibble: 100 × 4
# Rowwise: 
      x1    x2    x3 Sum.X
   <int> <int> <int> <int>
 1     0     1     0     1
 2     1     0     1     2
 3     0     0     1     1
 4     1     1     1     3
 5     1     0     0     1
 6     0     1     1     2
 7     1     1     0     2
 8     1     1     0     2
 9     1     0     0     2
10     0     0     0     0
# … with 90 more rows

CodePudding user response:

You don't need to use rowwise, just use rowSums.

tibble %>% 
  mutate(Sum.X = rowSums(., na.rm = T))
# # A tibble: 100 × 4
#       x1    x2    x3 Sum.X
#    <int> <int> <int> <dbl>
#  1     0     1     0     1
#  2     1     0     1     2
#  3     0     0     1     1
#  4     1     1     1     3
#  5     1     0     0     1
#  6     0     1     1     2
#  7     1     1     0     2
#  8     1     1     0     2
#  9     1     0     0     1
# 10     0     0     0     0
# # … with 90 more rows
# # ℹ Use `print(n = ...)` to see more rows

CodePudding user response:

Using c_across:

df %>% 
  rowwise() %>% 
  mutate(Sum.X = sum(c_across(everything()),na.rm = T))
# A tibble: 100 × 4
# Rowwise: 
      x1    x2    x3 Sum.X
   <int> <int> <int> <int>
 1     1     0     1     2
 2     0     1     0     1
 3     0     0     0     0
 4     1     1     0     2
 5     1     0     1     2
 6     0     1     1     2
 7     0     0     0     0
 8     1     0     1     2
 9     1     1     0     2
10     0     0     0     0
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows

Can use base R as well:

apply(df, 1, sum, na.rm = T)
  [1] 2 1 0 2 2 2 0 2 2 0 2 2 1 2 1 1 2 2 2 1 2 1 0 3 2 3 0 2 1 2 2 3 2 2 2 1 2 3 2 0 2 2 2 1 2 1 2 1 2
 [50] 1 3 1 1 2 1 2 2 2 2 1 3 1 1 1 1 3 1 2 2 2 1 3 1 1 1 0 3 3 1 1 2 1 2 0 3 2 2 3 2 2 2 2 3 2 2 3 2 2
 [99] 1 1
  • Related