Home > Enterprise >  Sum/aggregate all rows wherever cell value meets condition in R
Sum/aggregate all rows wherever cell value meets condition in R

Time:07-07

I have a data frame such as:

df <- data.frame(col1 = c(1, 2500,1, 1, 1), col2 = c(12, NA, 8,9, 5), col3= c(25, 48, 7, 9, 14))
df
  col1 col2 col3
1    1   12   25
2 2500   NA   48
3    1    8    7
4    1    9    9
5    1    5   14

I am hoping to use aggregate or summarise to sum the rows only when cell value is above 1, while ignoring NA values, to make a data frame that looks like this:

df_desired <- data.frame(col1 = c(2500), col2 = sum(12, 8,9, 5), col3= sum(25, 48, 7, 9, 14))
df_desired
  col1 col2 col3
1 2500   34  103

I have gotten this far:

df_test <- df %>%
summarise_at(vars(contains("col")), sum, na.rm = TRUE)
df_test
  col1 col2 col3
1 2504   34  103

I've tried many variations of ifelse with no success. I've so far only found examples of conditional summing across columns and adding a new column.

CodePudding user response:

Using summarise with across

library(dplyr)
df %>%
  summarise(across(everything(),  ~ sum(.x[.x > 1], na.rm = TRUE)))
  col1 col2 col3
1 2500   34  103

Or in base R with colSums after replaceing the elements that are less than 2 to NA

colSums(replace(df, df < 2, NA),  na.rm = TRUE)
col1 col2 col3 
2500   34  103 

CodePudding user response:

We could use map_dbl after setting each 1 to 0 within a ifelse statement:

library(purrr)
library(dplyr)

df %>% 
  mutate(across(, ~ifelse(. == 1, 0, .))) %>% 
  map_dbl(~ sum(.x, na.rm = TRUE))

Or without purrr:

library(dplyr)

df %>% 
  mutate(across(, ~ifelse(. == 1, 0, .))) %>% 
  summarise(across(, ~sum(., na.rm = TRUE)))
col1 col2 col3 
2500   34  103 

CodePudding user response:

Using sapply function, you solve your problem as follow:

sapply(df, function(x) sum(x[x>1], na.rm=TRUE))

col1 col2 col3 
2500   34  103

or using summarise function

df |> 
  summarise(across(, ~ sum(.[.>1], na.rm=TRUE)))
  col1 col2 col3
1 2500   34  103

CodePudding user response:

in Base R use colSums:

colSums(df * (df > 1), na.rm = TRUE)

col1 col2 col3 
2500   34  103
  •  Tags:  
  • r
  • Related