Home > Software design >  Select a single value in summarise dplyr
Select a single value in summarise dplyr

Time:11-26

I have this table

subs_id amount flag
1 15 target
1 10 taker
2 30 target
3 20 taker
3 10 target

I want to create a new table that does the following:

  1. calculate the total sum of the variable amount by each subs_id
  2. a column that just shows the value of the variable amount when the subs_id has the variable flag equal to "taker", and 0 otherwise.

The resulting table should look like this:

subs_id ttl_amount amount
1 25 10
2 30 0
3 30 20

Here is what I tried to get the result:

df%>%
    group_by(subs_id)%>%
    summarise(ttl_amount=sum(amt),
              amount=case_when(flag=="taker"~amt[which(flag=="taker")[1]],TRUE~0))

This gives the following error: ! 'names' attribute [1] must be the same length as the vector [0]

Note that the solution should preferably be using summarise, as multiple other aggregation will happen in it.

CodePudding user response:

You could try:

df %>%
  group_by(subs_id) %>%
  summarise(
    ttl_amount = sum(amount),
    amount = coalesce(amount[flag == 'taker'][1L], 0L)
    )

Output:

# A tibble: 3 x 3
  subs_id ttl_amount amount
    <int>      <int>  <int>
1       1         25     10
2       2         30      0
3       3         30     20

CodePudding user response:

You can multiply by a logical vector to get a conditional sum:

library(dplyr)

df %>% 
  group_by(subs_id) %>% 
  summarize(
    ttl_amount = sum(amount), 
    amount = sum(amount * (flag == "taker"))
  )
# A tibble: 3 × 3
  subs_id ttl_amount amount
    <dbl>      <dbl>  <dbl>
1       1         25     10
2       2         30      0
3       3         30     20

CodePudding user response:

Same as yours, but use left_join afterwards to add the amount column from the original

library(tidyverse)

t <- tibble(subs_id=c(1,1,2,3,3),
       amount=c(15,10,30,20,10),
       flag=c("target", "taker", "target", "taker", "target"))

t %>%
  group_by(subs_id)%>%
  summarise(ttl_amount=sum(amount)) %>%
  left_join(select(filter(t, flag=="taker"), "subs_id", "amount", "flag"), by="subs_id") %>%
  select(-flag) %>%
  replace_na(list(amount=0))

yields:

subs_id   ttl_amount   amount
1   25  10      
2   30  0       
3   30  20  

CodePudding user response:

Here is a simplified answer to the accepted answer:

library(tidyverse)

df |>
  group_by(subs_id) |>
  summarise(
    ttl_amount = sum(amount),
    amount= sum(amount[flag == "taker"]))
#> # A tibble: 3 x 3
#>   subs_id ttl_amount amount
#>     <dbl>      <dbl>  <dbl>
#> 1       1         25     10
#> 2       2         30      0
#> 3       3         30     20

CodePudding user response:

Here's how I would do it:

library(dplyr)

subs_id <- c(1, 1, 2, 3, 3)
amount  <- c(15, 10, 30, 20, 10)
flag <- c("target", "taker", "target", "taker", "target")

data <- data.frame(subs_id, amount, flag)

data_new <- data %>% group_by(subs_id) %>% 
  mutate(column_summary = sum(amount))

data_new$column_id <- ifelse(data_new$flag == "taker", data_new$amount, 0)
  • Related