R dplyr - Same column, getting the sum of the two following rows of a dataframe-CodePudding

I have the following table:

date <- c("2021-12-30", "2021-12-30", "2021-12-30", "2022-01-06", "2022-01-06", "2022-01-06", "2022-01-13", "2022-01-13", "2022-01-13")
age_group <- c("ALL", "<60", "60 ", "ALL", "<60", "60 ", "ALL", "<60", "60 ")
number <- c(583, 2207, 56, 662, 2852, 71, 432, 1243, 60)
number2 <- c(852, 900, 50, 1040, 1103, 76, 456, 461, 29)

data.frame(date, age_group, number, number2)

        date age_group number number2
1 2021-12-30       ALL    583     852
2 2021-12-30       <60   2207     900
3 2021-12-30       60      56      50
4 2022-01-06       ALL    662    1040
5 2022-01-06       <60   2852    1103
6 2022-01-06       60      71      76
7 2022-01-13       ALL    432     456
8 2022-01-13       <60   1243     461
9 2022-01-13       60      60      29

As you can see, the values of "<60" and "60 " do not correspond to the value of "ALL" (i.e., for "2021-12-30" and the "number" column -> 2207 (from age group "<60") 56 (from age group "60 ") != 583 (from age group "ALL").

I want to modify the dataframe so that the values from "ALL" are equal to the sum of "<60" and "60 " for each date and each column, i.e.:

        date age_group number number2
1 2021-12-30       ALL   2263     950
2 2021-12-30       <60   2207     900
3 2021-12-30       60      56      50
4 2022-01-06       ALL   2923    1179
5 2022-01-06       <60   2852    1103
6 2022-01-06       60      71      76
7 2022-01-13       ALL   1303     480
8 2022-01-13       <60   1243     461
9 2022-01-13       60      60      29

Is there a straightforward solution using dplyr or does this require multiple pivot_longer and pivot_wider back and forth?

CodePudding user response：

A good way to do this is to transpose the dataframe by pivoting, so that the 3 age groupings are separate columns instead of rows:

df2 <- df %>%
    group_by(date) %>%
    pivot_longer(-c('date', 'age_group')) %>%
    pivot_wider(names_from = 'age_group') %>%
    mutate(ALL = `<60`   `60 `)

df2
  date       name      ALL `<60` `60 `
  <chr>      <chr>   <dbl> <dbl> <dbl>
1 2021-12-30 number   2263  2207    56
2 2021-12-30 number2   950   900    50
3 2022-01-06 number   2923  2852    71
4 2022-01-06 number2  1179  1103    76
5 2022-01-13 number   1303  1243    60
6 2022-01-13 number2   490   461    29

Note the required backticks (`) around the variable names in mutate. They are needed because variable names cannot otherwise start with a number or an illegal character (here the less than operator <). The backtick tells R that these are variable names, despite what they look like.

Then we can just transpose it back if you'd prefer:

df2 %>%
    pivot_longer(-c('date', 'name'), names_to = 'age_group') %>%
    pivot_wider()

  date       age_group number number2
  <chr>      <chr>      <dbl>   <dbl>
1 2021-12-30 ALL         2263     950
2 2021-12-30 <60         2207     900
3 2021-12-30 60            56      50
4 2022-01-06 ALL         2923    1179
5 2022-01-06 <60         2852    1103
6 2022-01-06 60            71      76
7 2022-01-13 ALL         1303     490
8 2022-01-13 <60         1243     461
9 2022-01-13 60            60      29

CodePudding user response：

Here is another tidyverse option, where you do not have to pivot the data. Here, I filter out the ALL rows, then get the sum for each number, which is grouped by date. Then, I bind ALL back to the original dataframe (minus the ALL rows).

library(tidyverse)

df %>%
  filter(age_group != "ALL") %>%
  group_by(date) %>%
  summarise(age_group = "ALL", across(c(number, number2), sum)) %>%
  bind_rows(., df %>% filter(age_group != "ALL")) %>%
  arrange(date)

Output

  date       age_group number number2
  <chr>      <chr>      <dbl>   <dbl>
1 2021-12-30 ALL         2263     950
2 2021-12-30 <60         2207     900
3 2021-12-30 60            56      50
4 2022-01-06 ALL         2923    1179
5 2022-01-06 <60         2852    1103
6 2022-01-06 60            71      76
7 2022-01-13 ALL         1303     490
8 2022-01-13 <60         1243     461
9 2022-01-13 60            60      29

Data

df <-
  structure(
    list(
      date = c(
        "2021-12-30", "2021-12-30", "2021-12-30", "2022-01-06", "2022-01-06",
        "2022-01-06", "2022-01-13", "2022-01-13", "2022-01-13"
      ),
      age_group = c("ALL", "<60", "60 ", "ALL", "<60",
                    "60 ", "ALL", "<60", "60 "),
      number = c(583, 2207, 56, 662, 2852,
                 71, 432, 1243, 60),
      number2 = c(852, 900, 50, 1040, 1103, 76,
                  456, 461, 29)
    ),
    class = "data.frame",
    row.names = c(NA,-9L)
  )

CodePudding user response：

A possible solution, using janitor::adorn_totals:

library(tidyverse)
library(janitor)

df %>% 
  filter(age_group != "ALL") %>% 
  group_split(date) %>% 
  map_df(~ adorn_totals(.x, name = date, fill = "ALL"))

#>        date age_group number number2
#>  2021-12-30       <60   2207     900
#>  2021-12-30       60      56      50
#>  2021-12-30       ALL   2263     950
#>  2022-01-06       <60   2852    1103
#>  2022-01-06       60      71      76
#>  2021-12-30       ALL   2923    1179
#>  2022-01-13       <60   1243     461
#>  2022-01-13       60      60      29
#>  2021-12-30       ALL   1303     490

Or without janitor:

library(tidyverse)

df %>% 
  group_by(date) %>% 
  mutate(number = if_else(row_number() == 1, sum(number[c(F, T, T)]), number),
     number2 = if_else(row_number() == 1, sum(number2[c(F, T, T)]), number2)) %>% 
  ungroup

#> # A tibble: 9 × 4
#>   date       age_group number number2
#>   <chr>      <chr>      <dbl>   <dbl>
#> 1 2021-12-30 ALL         2263     950
#> 2 2021-12-30 <60         2207     900
#> 3 2021-12-30 60            56      50
#> 4 2022-01-06 ALL         2923    1179
#> 5 2022-01-06 <60         2852    1103
#> 6 2022-01-06 60            71      76
#> 7 2022-01-13 ALL         1303     490
#> 8 2022-01-13 <60         1243     461
#> 9 2022-01-13 60            60      29