My data take this shape:
set.seed(666)
grouping <- rep(c("A", "B"), 3)
theMonth <- c("2022_01", "2022_01", "2022_02", "2022_02", "2022_03", "2022_03")
revenue <- sample(100:1000, 6)
df <- tibble(grouping, theMonth, revenue)
I'm being asked to spread these data by month...
step1 <- spread(df, theMonth, revenue)
step1
# A tibble: 2 × 4
grouping `2022_01` `2022_02` `2022_03`
<chr> <int> <int> <int>
1 A 673 707 639
2 B 737 222 753
...but also, within the same table, I'm being asked for the cumulative progress of B (and only B) toward a target, say in this case 10000. So the desired output is something like:
grouping `2022_01` `2022_02` `2022_03`
<chr> <int> <int> <int>
1 A 673 707 639
2 B 737 222 753
3 CumSumB 737 959 1712
4 Progress 7.37% 9.59% 17.12%
What's the best way to attack this? Should I do it before I spread, probably using mutate
? Or is there a clean way to do it after the spread?
(Answer does not have to use dplyr
, but that is my preferred package for this sort of work.)
CodePudding user response:
We may filter the data first, get the cumulative sum column, bind the data with the original data and then create the row for 'Progress' with add_row
library(dplyr)
library(tidyr)
library(tibble)
df %>%
filter(grouping == 'B') %>%
mutate(grouping = 'CumSumB', revenue = cumsum(revenue)) %>%
bind_rows(df, .) %>%
pivot_wider(names_from = theMonth, values_from = revenue) %>%
add_row(., tibble(grouping = "Progress", .[3, -1]/10000 * 100))
-output
# A tibble: 4 × 4
grouping `2022_01` `2022_02` `2022_03`
<chr> <dbl> <dbl> <dbl>
1 A 673 707 639
2 B 737 222 753
3 CumSumB 737 959 1712
4 Progress 7.37 9.59 17.1
Adding the %
would make the whole column character
. If needed, it can be done
library(stringr)
df %>%
filter(grouping == 'B') %>%
mutate(grouping = 'CumSumB', revenue = cumsum(revenue)) %>%
bind_rows(df, .) %>%
pivot_wider(names_from = theMonth, values_from = revenue) %>%
add_row(., tibble(grouping = "Progress", .[3, -1]/10000 * 100)) %>%
mutate(across(-grouping, ~ replace(.x, n(), str_c(.x[n()], "%"))))
# A tibble: 4 × 4
grouping `2022_01` `2022_02` `2022_03`
<chr> <chr> <chr> <chr>
1 A 673 707 639
2 B 737 222 753
3 CumSumB 737 959 1712
4 Progress 7.37% 9.59% 17.12%
CodePudding user response:
Here is an alternative approach:
library(dplyr)
library(tidyr)
df %>%
mutate(revenueA = lag(revenue, default = revenue[1])) %>%
filter(row_number() %% 2 == 0) %>%
mutate(CumSum = cumsum(revenue),
Progres = paste0(CumSum/100, "%")) %>%
pivot_longer(-c(grouping, theMonth),
names_to = "key",
values_to = "val",
values_transform = list(val = as.character)) %>%
pivot_wider(names_from = theMonth, values_from = val) %>%
mutate(grouping = case_when(key == "revenue" ~"B",
key == "revenueA" ~ "A",
TRUE ~ key)) %>%
arrange(grouping) %>%
select(-key)
grouping `2022_01` `2022_02` `2022_03`
<chr> <chr> <chr> <chr>
1 A 673 707 639
2 B 737 222 753
3 CumSum 737 959 1712
4 Progres 7.37% 9.59% 17.12%