Home > front end >  Summation over multiple rows based on a condition in R
Summation over multiple rows based on a condition in R

Time:06-23

I have a dataset like the following

name city number
A M 2
A N 3
A O 5
A P 7
B M 7
B N 8
B 0 9
B P 2

For each of of the name category, I want to sum the number of M and N value and put it in a new variable. The same goes for O and P value. The dataset should look like the following:

name city number
A X 5
A Y 12
B X 15
B Y 11

I'm new in R programming. I have tried to use group by and mutate method but was not successful.

CodePudding user response:

We could modify the values in the column 'city' to 'X', 'Y', and do a group by sum

library(dplyr)
df1 %>%
   group_by(name, city = case_when(city %in% c("M", "N") ~ 'X',
    city %in% c("O", "P") ~ "Y")) %>%
   summarise(number = sum(number), .groups = 'drop')

-output

# A tibble: 4 × 3
  name  city  number
  <chr> <chr>  <int>
1 A     X          5
2 A     Y         12
3 B     X         15
4 B     Y         11

data

df1 <- structure(list(name = c("A", "A", "A", "A", "B", "B", "B", "B"
), city = c("M", "N", "O", "P", "M", "N", "O", "P"), number = c(2L, 
3L, 5L, 7L, 7L, 8L, 9L, 2L)), row.names = c(NA, -8L), class = "data.frame")
  • Related