Home > Net >  Trying to reproduce a particular pivot table in R
Trying to reproduce a particular pivot table in R

Time:08-11

My data take this shape:

set.seed(666)

grouping <- rep(c("A", "B"), 3)
theMonth <- c("2022_01", "2022_01", "2022_02", "2022_02", "2022_03", "2022_03")
revenue <-  sample(100:1000, 6)

df <- tibble(grouping, theMonth, revenue)

I'm being asked to spread these data by month...

step1 <- spread(df, theMonth, revenue)
step1
# A tibble: 2 × 4
  grouping `2022_01` `2022_02` `2022_03`
  <chr>        <int>     <int>     <int>
1 A              673       707       639
2 B              737       222       753

...but also, within the same table, I'm being asked for the cumulative progress of B (and only B) toward a target, say in this case 10000. So the desired output is something like:

  grouping `2022_01` `2022_02` `2022_03`
  <chr>        <int>     <int>     <int>
1 A              673       707       639
2 B              737       222       753
3 CumSumB        737       959      1712  
4 Progress     7.37%     9.59%    17.12%

What's the best way to attack this? Should I do it before I spread, probably using mutate? Or is there a clean way to do it after the spread?

(Answer does not have to use dplyr, but that is my preferred package for this sort of work.)

CodePudding user response:

We may filter the data first, get the cumulative sum column, bind the data with the original data and then create the row for 'Progress' with add_row

library(dplyr)
library(tidyr)
library(tibble)
df %>%
   filter(grouping == 'B') %>% 
   mutate(grouping = 'CumSumB', revenue = cumsum(revenue)) %>% 
   bind_rows(df, .) %>% 
   pivot_wider(names_from = theMonth, values_from = revenue) %>% 
   add_row(., tibble(grouping = "Progress",  .[3, -1]/10000 * 100))

-output

# A tibble: 4 × 4
  grouping `2022_01` `2022_02` `2022_03`
  <chr>        <dbl>     <dbl>     <dbl>
1 A           673       707        639  
2 B           737       222        753  
3 CumSumB     737       959       1712  
4 Progress      7.37      9.59      17.1

Adding the % would make the whole column character. If needed, it can be done

library(stringr)
df %>%
   filter(grouping == 'B') %>% 
   mutate(grouping = 'CumSumB', revenue = cumsum(revenue)) %>% 
   bind_rows(df, .) %>% 
   pivot_wider(names_from = theMonth, values_from = revenue) %>% 
   add_row(., tibble(grouping = "Progress",  .[3, -1]/10000 * 100)) %>% 
  mutate(across(-grouping, ~ replace(.x, n(), str_c(.x[n()], "%"))))
# A tibble: 4 × 4
  grouping `2022_01` `2022_02` `2022_03`
  <chr>    <chr>     <chr>     <chr>    
1 A        673       707       639      
2 B        737       222       753      
3 CumSumB  737       959       1712     
4 Progress 7.37%     9.59%     17.12%   

CodePudding user response:

Here is an alternative approach:

library(dplyr)
library(tidyr)

df %>%   
  mutate(revenueA = lag(revenue, default = revenue[1])) %>% 
  filter(row_number() %% 2 == 0)  %>% 
  mutate(CumSum = cumsum(revenue),
         Progres = paste0(CumSum/100, "%")) %>% 
  pivot_longer(-c(grouping, theMonth), 
               names_to = "key",
               values_to = "val", 
               values_transform = list(val = as.character)) %>%  
  pivot_wider(names_from = theMonth, values_from = val) %>% 
  mutate(grouping = case_when(key == "revenue" ~"B",
                         key == "revenueA" ~ "A",
                         TRUE ~ key)) %>% 
  arrange(grouping) %>% 
  select(-key)
  grouping `2022_01` `2022_02` `2022_03`
  <chr>    <chr>     <chr>     <chr>    
1 A        673       707       639      
2 B        737       222       753      
3 CumSum   737       959       1712     
4 Progres  7.37%     9.59%     17.12% 
  • Related