A financial statement is a good illustration of this issue. Here is an example dataframe:
df <- data.frame( date = sample(seq(as.Date('2020/01/01'), as.Date('2020/12/31'), by="day"), 10),
category = sample(c('a','b', 'c'), 10, replace=TRUE),
direction = sample(c('credit', 'debit'), 10, replace=TRUE),
value = sample(0:25, 10, replace = TRUE) )
I want to produce a summary table with incoming
, outgoing
and total
columns for each category.
df %>%
pivot_wider(names_from = direction, values_from = value) %>%
group_by(category) %>%
summarize(incoming = sum(credit, na.rm=TRUE), outgoing=sum(debit,na.rm=TRUE) ) %>%
mutate(total= incoming-outgoing)
In most cases this works perfectly with the example dataframe above.
But there are cases where df$direction
could contain a single value e.g., credit
, resulting in an error.
Error: Problem with `summarise()` column `outgoing`.
object 'debit' not found
Given that I have no control over the dataframe, what is the best way to handle this?
I've been playing around with a conditional statement in the summarize method to check that the column exists, but have not managed to get this working.
...
summarize( outgoing = case_when(
"debit" %in% colnames(.) ~ sum(debit,na.rm=TRUE),
TRUE ~ 0 ) )
...
Have I made a syntax error, or am I going in completely the wrong direction with this?
CodePudding user response:
The issue happens only when one of the elements is presents i.e. 'credit' and no 'debit' or viceversa. Then, the pivot_wider
doesn't create the column missing. Instead of pivoting and then summarising, do this directly with summarise
and ==
i.e. if the 'debit' is absent, sum
will take care of it by returning 0
library(dplyr)
df %>%
slice(-c(9:10)) %>% # just removed the 'debit' rows completely
group_by(category) %>%
summarise(total = sum(value[direction == 'credit']) -
sum(value[direction == "debit"]))
-output
# A tibble: 3 × 2
category total
<chr> <int>
1 a 15
2 b 30
3 c 63
With pivot_wider
, it is not the case
df %>%
slice(-c(9:10)) %>%
pivot_wider(names_from = direction, values_from = value)
# A tibble: 8 × 3
date category credit
<date> <chr> <int>
1 2020-07-25 c 19
2 2020-05-09 b 15
3 2020-08-27 a 15
4 2020-03-27 b 15
5 2020-04-06 c 6
6 2020-07-06 c 11
7 2020-09-22 c 25
8 2020-10-06 c 2
it creates only the 'credit' column, thus when we call a column 'debit' that is not created, it throws error
df %>%
slice(-c(9:10)) %>%
pivot_wider(names_from = direction, values_from = value) %>%
group_by(category) %>%
summarize(incoming = sum(credit, na.rm=TRUE),
outgoing=sum(debit,na.rm=TRUE) )
Error: Problem with
summarise()
columnoutgoing
. ℹoutgoing = sum(debit, na.rm = TRUE)
. ✖ object 'debit' not found ℹ The error occurred in group 1: category = "a". Runrlang::last_error()
to see where the error occurred.
In this case, we can do a complete
to create some rows with debit
as well which will have NA
for other columns
library(tidyr)
df %>%
slice(-c(9:10)) %>%
complete(category, direction = c("credit", "debit")) %>%
pivot_wider(names_from = direction, values_from = value) %>%
group_by(category) %>%
summarize(incoming = sum(credit, na.rm=TRUE),
outgoing=sum(debit,na.rm=TRUE) ) %>%
mutate(total= incoming-outgoing)
# A tibble: 3 × 4
category incoming outgoing total
<chr> <int> <int> <int>
1 a 15 0 15
2 b 30 0 30
3 c 63 0 63