I have a data frame such as:
df <- data.frame(col1 = c(1, 2500,1, 1, 1), col2 = c(12, NA, 8,9, 5), col3= c(25, 48, 7, 9, 14))
df
col1 col2 col3
1 1 12 25
2 2500 NA 48
3 1 8 7
4 1 9 9
5 1 5 14
I am hoping to use aggregate
or summarise
to sum the rows only when cell value is above 1, while ignoring NA values, to make a data frame that looks like this:
df_desired <- data.frame(col1 = c(2500), col2 = sum(12, 8,9, 5), col3= sum(25, 48, 7, 9, 14))
df_desired
col1 col2 col3
1 2500 34 103
I have gotten this far:
df_test <- df %>%
summarise_at(vars(contains("col")), sum, na.rm = TRUE)
df_test
col1 col2 col3
1 2504 34 103
I've tried many variations of ifelse
with no success. I've so far only found examples of conditional summing across columns and adding a new column.
CodePudding user response:
Using summarise
with across
library(dplyr)
df %>%
summarise(across(everything(), ~ sum(.x[.x > 1], na.rm = TRUE)))
col1 col2 col3
1 2500 34 103
Or in base R
with colSums
after replace
ing the elements that are less than 2 to NA
colSums(replace(df, df < 2, NA), na.rm = TRUE)
col1 col2 col3
2500 34 103
CodePudding user response:
We could use map_dbl
after setting each 1 to 0 within a ifelse
statement:
library(purrr)
library(dplyr)
df %>%
mutate(across(, ~ifelse(. == 1, 0, .))) %>%
map_dbl(~ sum(.x, na.rm = TRUE))
Or without purrr
:
library(dplyr)
df %>%
mutate(across(, ~ifelse(. == 1, 0, .))) %>%
summarise(across(, ~sum(., na.rm = TRUE)))
col1 col2 col3
2500 34 103
CodePudding user response:
Using sapply
function, you solve your problem as follow:
sapply(df, function(x) sum(x[x>1], na.rm=TRUE))
col1 col2 col3
2500 34 103
or using summarise
function
df |>
summarise(across(, ~ sum(.[.>1], na.rm=TRUE)))
col1 col2 col3
1 2500 34 103
CodePudding user response:
in Base R use colSums
:
colSums(df * (df > 1), na.rm = TRUE)
col1 col2 col3
2500 34 103