Trying to reproduce a particular pivot table in R-CodePudding

My data take this shape:

set.seed(666)

grouping <- rep(c("A", "B"), 3)
theMonth <- c("2022_01", "2022_01", "2022_02", "2022_02", "2022_03", "2022_03")
revenue <-  sample(100:1000, 6)

df <- tibble(grouping, theMonth, revenue)

I'm being asked to spread these data by month...

step1 <- spread(df, theMonth, revenue)
step1
# A tibble: 2 × 4
  grouping `2022_01` `2022_02` `2022_03`
  <chr>        <int>     <int>     <int>
1 A              673       707       639
2 B              737       222       753

...but also, within the same table, I'm being asked for the cumulative progress of B (and only B) toward a target, say in this case 10000. So the desired output is something like:

  grouping `2022_01` `2022_02` `2022_03`
  <chr>        <int>     <int>     <int>
1 A              673       707       639
2 B              737       222       753
3 CumSumB        737       959      1712  
4 Progress     7.37%     9.59%    17.12%

What's the best way to attack this? Should I do it before I spread, probably using mutate? Or is there a clean way to do it after the spread?

(Answer does not have to use dplyr, but that is my preferred package for this sort of work.)

CodePudding user response：

We may filter the data first, get the cumulative sum column, bind the data with the original data and then create the row for 'Progress' with add_row

library(dplyr)
library(tidyr)
library(tibble)
df %>%
   filter(grouping == 'B') %>% 
   mutate(grouping = 'CumSumB', revenue = cumsum(revenue)) %>% 
   bind_rows(df, .) %>% 
   pivot_wider(names_from = theMonth, values_from = revenue) %>% 
   add_row(., tibble(grouping = "Progress",  .[3, -1]/10000 * 100))

-output

# A tibble: 4 × 4
  grouping `2022_01` `2022_02` `2022_03`
  <chr>        <dbl>     <dbl>     <dbl>
1 A           673       707        639  
2 B           737       222        753  
3 CumSumB     737       959       1712  
4 Progress      7.37      9.59      17.1

Adding the % would make the whole column character. If needed, it can be done

library(stringr)
df %>%
   filter(grouping == 'B') %>% 
   mutate(grouping = 'CumSumB', revenue = cumsum(revenue)) %>% 
   bind_rows(df, .) %>% 
   pivot_wider(names_from = theMonth, values_from = revenue) %>% 
   add_row(., tibble(grouping = "Progress",  .[3, -1]/10000 * 100)) %>% 
  mutate(across(-grouping, ~ replace(.x, n(), str_c(.x[n()], "%"))))
# A tibble: 4 × 4
  grouping `2022_01` `2022_02` `2022_03`
  <chr>    <chr>     <chr>     <chr>    
1 A        673       707       639      
2 B        737       222       753      
3 CumSumB  737       959       1712     
4 Progress 7.37%     9.59%     17.12%

CodePudding user response：

Here is an alternative approach:

library(dplyr)
library(tidyr)

df %>%   
  mutate(revenueA = lag(revenue, default = revenue[1])) %>% 
  filter(row_number() %% 2 == 0)  %>% 
  mutate(CumSum = cumsum(revenue),
         Progres = paste0(CumSum/100, "%")) %>% 
  pivot_longer(-c(grouping, theMonth), 
               names_to = "key",
               values_to = "val", 
               values_transform = list(val = as.character)) %>%  
  pivot_wider(names_from = theMonth, values_from = val) %>% 
  mutate(grouping = case_when(key == "revenue" ~"B",
                         key == "revenueA" ~ "A",
                         TRUE ~ key)) %>% 
  arrange(grouping) %>% 
  select(-key)

  grouping `2022_01` `2022_02` `2022_03`
  <chr>    <chr>     <chr>     <chr>    
1 A        673       707       639      
2 B        737       222       753      
3 CumSum   737       959       1712     
4 Progres  7.37%     9.59%     17.12%