Home > Software engineering >  How to summarize one column and its total value and show the distribution of the value in new column
How to summarize one column and its total value and show the distribution of the value in new column

Time:10-26

I could not explain the title but I have following problem. Assume I have this data frame.

library(tidyverse)

df <- data.frame(Giver = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'), 
                 Amount = c(100, 150, 150, 200, 100, 200, 100, 100, 100), 
                 Receiver = c('X', 'X', 'Y', 'X', 'Z', 'Z', 'Y', 'Y', 'Y')

               
Giver    Amount   Receiver
  A        100       X
  A        150       X
  A        150       Y
  B        200       X
  B        100       Z
  B        200       Z
  C        100       Y
  C        100       Y
  C        100       Y

I want to have a list for each Giver, the sum of the amount, and the distribution of the amount across the receiver.

So basically in the end a data frame like this:

Giver    Amount   X    Y    Z
 A         400    250  150  0
 B         500    200  0    300
 C         300    0    300  0

What I have so far is to summarize the giver and their total amount but not the distribution.. I have no idea how I can achieve this

df_new <- df %>%
  group_by(Giver) %>%
  summarize(Payments = n(), Sum = sum(Amount)) %>% 
  arrange(desc(Amount))

Which gives me following:

Giver  Payments  Amount   
 A         3      400    
 B         3      500    
 C         3      300    

CodePudding user response:

library(tidyverse)
df <- structure(list(Giver = c("A", "A", "A", "B", "B", "B", "C", "C", 
"C"), Amount = c(100L, 150L, 150L, 200L, 100L, 200L, 100L, 100L, 
100L), Receiver = c("X", "X", "Y", "X", "Z", "Z", "Y", "Y", "Y"
)), row.names = c(NA, -9L), class = "data.frame")

df_new <- df %>% 
  pivot_wider(names_from=Receiver, values_from = Amount, values_fn=sum, values_fill = 0) %>% 
  mutate(Amount = rowSums(across(X:Z)))

df_new
#> # A tibble: 3 × 5
#>   Giver     X     Y     Z Amount
#>   <chr> <int> <int> <int>  <dbl>
#> 1 A       250   150     0    400
#> 2 B       200     0   300    500
#> 3 C         0   300     0    300
  • Related