I'm trying to move from Excel to R and am looking to do something similar to SumIfs
in excel. I want to create a new column that is the sum of the rows from multiple columns but only if the value is greater than 25.
My data looks like this which is the area of different crops on farms and want to add a new column of total agriculture area but only include crops if there are more than 25 acres:
Prop_ID | State | Pasture | Soy | Corn |
---|---|---|---|---|
1 | WI | 20 | 45 | 75 |
2 | MN | 10 | 80 | 122 |
3 | MN | 152 | 0 | 15 |
4 | IL | 0 | 10 | 99 |
5 | IL | 75 | 38 | 0 |
6 | WI | 30 | 45 | 0 |
7 | WI | 68 | 55 | 0 |
I'm looking to produce a new table like this:
Prop_ID | State | Pasture | Soy | Corn | Total_ag |
---|---|---|---|---|---|
1 | WI | 20 | 45 | 75 | 120 |
2 | MN | 10 | 80 | 122 | 202 |
3 | MN | 152 | 0 | 15 | 152 |
4 | IL | 0 | 10 | 20 | 0 |
5 | IL | 15 | 15 | 20 | 0 |
6 | WI | 30 | 45 | 0 | 75 |
7 | WI | 50 | 55 | 0 | 105 |
I want to reference the columns to sum using there index [3:5] and not there name since I have different crops in different datasets.
I'm assuming using mutate or summarize with that I need to do but I can't figure it out.
CodePudding user response:
We can replace those rows with values less than 25 to NA or 0 and then use rowSums
library(dplyr)
df1 <- df1 %>%
mutate(Total_ag = rowSums(across(where(is.numeric),
~ replace(.x, .x < 25, NA)), na.rm = TRUE))
Similar option in base R
df1$Total_ag <- rowSums(replace(df1[3:5], df1[3:5] < 25, NA), na.rm = TRUE)
CodePudding user response:
Multiply value matrix with boolean matrix.
rowSums(dat[3:5]*(dat[3:5] >= 25))
# [1] 120 202 152 99 113 75 123
Data:
dat <- structure(list(Prop_ID = 1:7, State = c("WI", "MN", "MN", "IL",
"IL", "WI", "WI"), Pasture = c(20L, 10L, 152L, 0L, 75L, 30L,
68L), Soy = c(45L, 80L, 0L, 10L, 38L, 45L, 55L), Corn = c(75L,
122L, 15L, 99L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-7L))