Home > Net >  How to sum across rows based on condition greater than
How to sum across rows based on condition greater than

Time:03-12

I'm trying to move from Excel to R and am looking to do something similar to SumIfs in excel. I want to create a new column that is the sum of the rows from multiple columns but only if the value is greater than 25.

My data looks like this which is the area of different crops on farms and want to add a new column of total agriculture area but only include crops if there are more than 25 acres:

Prop_ID State Pasture Soy Corn
1 WI 20 45 75
2 MN 10 80 122
3 MN 152 0 15
4 IL 0 10 99
5 IL 75 38 0
6 WI 30 45 0
7 WI 68 55 0

I'm looking to produce a new table like this:

Prop_ID State Pasture Soy Corn Total_ag
1 WI 20 45 75 120
2 MN 10 80 122 202
3 MN 152 0 15 152
4 IL 0 10 20 0
5 IL 15 15 20 0
6 WI 30 45 0 75
7 WI 50 55 0 105

I want to reference the columns to sum using there index [3:5] and not there name since I have different crops in different datasets.

I'm assuming using mutate or summarize with that I need to do but I can't figure it out.

CodePudding user response:

We can replace those rows with values less than 25 to NA or 0 and then use rowSums

library(dplyr)
df1 <- df1 %>% 
  mutate(Total_ag = rowSums(across(where(is.numeric), 
     ~ replace(.x, .x < 25, NA)), na.rm = TRUE))

Similar option in base R

df1$Total_ag <- rowSums(replace(df1[3:5], df1[3:5] < 25, NA), na.rm = TRUE)

CodePudding user response:

Multiply value matrix with boolean matrix.

rowSums(dat[3:5]*(dat[3:5] >= 25))
# [1] 120 202 152  99 113  75 123

Data:

dat <- structure(list(Prop_ID = 1:7, State = c("WI", "MN", "MN", "IL", 
"IL", "WI", "WI"), Pasture = c(20L, 10L, 152L, 0L, 75L, 30L, 
68L), Soy = c(45L, 80L, 0L, 10L, 38L, 45L, 55L), Corn = c(75L, 
122L, 15L, 99L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-7L))
  • Related