I have the following table:
date <- c("2021-12-30", "2021-12-30", "2021-12-30", "2022-01-06", "2022-01-06", "2022-01-06", "2022-01-13", "2022-01-13", "2022-01-13")
age_group <- c("ALL", "<60", "60 ", "ALL", "<60", "60 ", "ALL", "<60", "60 ")
number <- c(583, 2207, 56, 662, 2852, 71, 432, 1243, 60)
number2 <- c(852, 900, 50, 1040, 1103, 76, 456, 461, 29)
data.frame(date, age_group, number, number2)
date age_group number number2
1 2021-12-30 ALL 583 852
2 2021-12-30 <60 2207 900
3 2021-12-30 60 56 50
4 2022-01-06 ALL 662 1040
5 2022-01-06 <60 2852 1103
6 2022-01-06 60 71 76
7 2022-01-13 ALL 432 456
8 2022-01-13 <60 1243 461
9 2022-01-13 60 60 29
As you can see, the values of "<60" and "60 " do not correspond to the value of "ALL" (i.e., for "2021-12-30" and the "number" column -> 2207 (from age group "<60") 56 (from age group "60 ") != 583 (from age group "ALL").
I want to modify the dataframe so that the values from "ALL" are equal to the sum of "<60" and "60 " for each date and each column, i.e.:
date age_group number number2
1 2021-12-30 ALL 2263 950
2 2021-12-30 <60 2207 900
3 2021-12-30 60 56 50
4 2022-01-06 ALL 2923 1179
5 2022-01-06 <60 2852 1103
6 2022-01-06 60 71 76
7 2022-01-13 ALL 1303 480
8 2022-01-13 <60 1243 461
9 2022-01-13 60 60 29
Is there a straightforward solution using dplyr
or does this require multiple pivot_longer
and pivot_wider
back and forth?
CodePudding user response:
A good way to do this is to transpose the dataframe by pivoting, so that the 3 age groupings are separate columns instead of rows:
df2 <- df %>%
group_by(date) %>%
pivot_longer(-c('date', 'age_group')) %>%
pivot_wider(names_from = 'age_group') %>%
mutate(ALL = `<60` `60 `)
df2
date name ALL `<60` `60 `
<chr> <chr> <dbl> <dbl> <dbl>
1 2021-12-30 number 2263 2207 56
2 2021-12-30 number2 950 900 50
3 2022-01-06 number 2923 2852 71
4 2022-01-06 number2 1179 1103 76
5 2022-01-13 number 1303 1243 60
6 2022-01-13 number2 490 461 29
Note the required backticks (`
) around the variable names in mutate
. They are needed because variable names cannot otherwise start with a number or an illegal character (here the less than operator <
). The backtick tells R that these are variable names, despite what they look like.
Then we can just transpose it back if you'd prefer:
df2 %>%
pivot_longer(-c('date', 'name'), names_to = 'age_group') %>%
pivot_wider()
date age_group number number2
<chr> <chr> <dbl> <dbl>
1 2021-12-30 ALL 2263 950
2 2021-12-30 <60 2207 900
3 2021-12-30 60 56 50
4 2022-01-06 ALL 2923 1179
5 2022-01-06 <60 2852 1103
6 2022-01-06 60 71 76
7 2022-01-13 ALL 1303 490
8 2022-01-13 <60 1243 461
9 2022-01-13 60 60 29
CodePudding user response:
Here is another tidyverse
option, where you do not have to pivot the data. Here, I filter out the ALL
rows, then get the sum for each number
, which is grouped by date
. Then, I bind ALL
back to the original dataframe (minus the ALL
rows).
library(tidyverse)
df %>%
filter(age_group != "ALL") %>%
group_by(date) %>%
summarise(age_group = "ALL", across(c(number, number2), sum)) %>%
bind_rows(., df %>% filter(age_group != "ALL")) %>%
arrange(date)
Output
date age_group number number2
<chr> <chr> <dbl> <dbl>
1 2021-12-30 ALL 2263 950
2 2021-12-30 <60 2207 900
3 2021-12-30 60 56 50
4 2022-01-06 ALL 2923 1179
5 2022-01-06 <60 2852 1103
6 2022-01-06 60 71 76
7 2022-01-13 ALL 1303 490
8 2022-01-13 <60 1243 461
9 2022-01-13 60 60 29
Data
df <-
structure(
list(
date = c(
"2021-12-30", "2021-12-30", "2021-12-30", "2022-01-06", "2022-01-06",
"2022-01-06", "2022-01-13", "2022-01-13", "2022-01-13"
),
age_group = c("ALL", "<60", "60 ", "ALL", "<60",
"60 ", "ALL", "<60", "60 "),
number = c(583, 2207, 56, 662, 2852,
71, 432, 1243, 60),
number2 = c(852, 900, 50, 1040, 1103, 76,
456, 461, 29)
),
class = "data.frame",
row.names = c(NA,-9L)
)
CodePudding user response:
A possible solution, using janitor::adorn_totals
:
library(tidyverse)
library(janitor)
df %>%
filter(age_group != "ALL") %>%
group_split(date) %>%
map_df(~ adorn_totals(.x, name = date, fill = "ALL"))
#> date age_group number number2
#> 2021-12-30 <60 2207 900
#> 2021-12-30 60 56 50
#> 2021-12-30 ALL 2263 950
#> 2022-01-06 <60 2852 1103
#> 2022-01-06 60 71 76
#> 2021-12-30 ALL 2923 1179
#> 2022-01-13 <60 1243 461
#> 2022-01-13 60 60 29
#> 2021-12-30 ALL 1303 490
Or without janitor
:
library(tidyverse)
df %>%
group_by(date) %>%
mutate(number = if_else(row_number() == 1, sum(number[c(F, T, T)]), number),
number2 = if_else(row_number() == 1, sum(number2[c(F, T, T)]), number2)) %>%
ungroup
#> # A tibble: 9 × 4
#> date age_group number number2
#> <chr> <chr> <dbl> <dbl>
#> 1 2021-12-30 ALL 2263 950
#> 2 2021-12-30 <60 2207 900
#> 3 2021-12-30 60 56 50
#> 4 2022-01-06 ALL 2923 1179
#> 5 2022-01-06 <60 2852 1103
#> 6 2022-01-06 60 71 76
#> 7 2022-01-13 ALL 1303 490
#> 8 2022-01-13 <60 1243 461
#> 9 2022-01-13 60 60 29