Here is some example data:
transactions <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), day = c("day1",
"day2", "day3", "day4", "day5", "day6", "day7", "day8"), sent_to = c(NA,
"Garden Cinema", "Pasta House", NA, "Blue Superstore", NA, NA,
NA), received_from = c("ATM", NA, NA, "Sarah", NA, "Jane", "Joe",
"Emily"), reference = c("add_cash", "cinema_tickets", "meal",
"gift", "shopping", "reimbursed", "reimbursed", "reimbursed"),
decrease = c(NA, 10.8, 12.5, NA, 15.25, NA, NA, NA), increase = c(50,
NA, NA, 30, NA, 5.4, 7.25, 2), reimbursed_id = c(NA, "R",
"R", NA, NA, "2", "3", "3")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -8L))
# id day sent_to received_from reference decrease increase reimbursed_id
# <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
# 1 1 day1 NA ATM add_cash NA 50 NA
# 2 2 day2 Garden Cinema NA cinema_tickets 10.8 NA R
# 3 3 day3 Pasta House NA meal 12.5 NA R
# 4 4 day4 NA Sarah gift NA 30 NA
# 5 5 day5 Blue Superstore NA shopping 15.2 NA NA
# 6 6 day6 NA Jane reimbursed NA 5.4 2
# 7 7 day7 NA Joe reimbursed NA 7.25 3
# 8 8 day8 NA Emily reimbursed NA 2 3
Note that this is linked to a question I have previously asked here: How to conditionally select a column, and subtract values in those rows from rows in another conditionally selected column in R?
I would like a similar solution to the above question but this time accounting for the fact that there are multiple people who are reimbursing the user for the same day.
This is the desired outcome:
# id day sent_to received_from reference decrease increase reimbursed_id actual_decrease
# <int> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
# 1 1 day1 NA ATM add_cash NA 50 NA NA
# 2 2 day2 Garden Cinema NA cinema_tickets 10.8 NA R 5.4
# 3 3 day3 Pasta House NA meal 12.5 NA R 3.25
# 4 4 day4 NA Sarah gift NA 30 NA NA
# 5 5 day5 Blue Superstore NA shopping 15.2 NA NA 15.2
# 6 6 day6 NA Jane reimbursed NA 5.4 2 NA
# 7 7 day7 NA Joe reimbursed NA 7.25 3 NA
# 8 8 day8 NA Emily reimbursed NA 2 3 NA
Any help is appreciated :)
CodePudding user response:
Well you should first summarise the increase per id:
increase_df <- transactions %>%
filter(!is.na(as.numeric(reimbursed_id))) %>%
group_by(id = as.numeric(reimbursed_id)) %>%
summarise(increase_sum = sum(increase))
id increase_sum
<dbl> <dbl>
1 2 5.4
2 3 9.25
To then merge it and make the subtraction:
left_join(transactions,increase_df,by = "id") %>%
mutate(decrease = ifelse(!is.na(increase_sum),decrease - increase_sum,decrease))
id day sent_to received_from reference decrease increase reimbursed_id increase_sum
<dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
1 1 day1 NA ATM add_cash NA 50 NA NA
2 2 day2 Garden Cinema NA cinema_tickets 5.4 NA R 5.4
3 3 day3 Pasta House NA meal 3.25 NA R 9.25
4 4 day4 NA Sarah gift NA 30 NA NA
5 5 day5 Blue Superstore NA shopping 15.2 NA NA NA
6 6 day6 NA Jane reimbursed NA 5.4 2 NA
7 7 day7 NA Joe reimbursed NA 7.25 3 NA
8 8 day8 NA Emily reimbursed NA 2 3 NA