Data Setup
I have a data set that appears somewhat like this simple dataframe below:
CAD_EXCHANGE <- 1.34
EUR_EXCHANGE <- 0.88
df <- tibble(
shipment = c("A", "B", "C", "D", "E"),
invoice = c(rep(500, 5)),
currency = factor(c("USD", "EUR", "CAD", NA, "SDD"))
)
df
# A tibble: 5 x 3
shipment invoice currency
<chr> <dbl> <fct>
1 A 500 USD
2 B 500 EUR
3 C 500 CAD
4 D 500 NA
5 E 500 SDD
levels(df$currency)
[1] "CAD" "EUR" "SDD" "USD"
End Goal
I am trying to convert the invoices to USD for some common other currencies (EUR and CAD), but not all of them or if data is missing (i.e., SDD and NA
). My final data frame should look like this:
# A tibble: 5 x 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <fct>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 NA 500 NA
5 E 500 SDD 500 SDD
Trial 1 -- Does Not Work
In the future, I may have more than just these few currencies to convert, so I applied a case_when()
statement. This was my first attempt:
df_USD1 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(currency == "EUR" ~ "USD",
currency == "CAD" ~ "USD",
TRUE ~ currency)
)
Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = case_when(...)`.
x must be a character vector, not a `factor` object.
With the above, I understand that I'm mixing character and factor in the assignment to currency_converted
because I have the default TRUE ~ currency
(and currency
is a factor). So I tried using only factors for the assignment...
Trial 2 -- Works, but not in a reliable way
df_USD2 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(
currency == "EUR" ~ currency[1],
currency == "CAD" ~ currency[1],
TRUE ~ currency)
)
It works, but only because in my setup for this question, USD is in the first position, and I cannot rely on that.
> df$currency
[1] USD EUR CAD <NA> SDD
Levels: CAD EUR SDD USD
Trial 3 -- Doesn't work
I thought I could try some other way of getting at the factor with subsetting, but this doesn't work:
df_USD3 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(
currency == "EUR" ~ df$currency[df$currency == "USD"],
currency == "CAD" ~ df$currency[df$currency == "USD"],
TRUE ~ currency
)
)
Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = factor(...)`.
x `currency == "EUR" ~ df$currency[df$currency == "USD"]`, `currency == "CAD" ~ df$currency[df$currency == "USD"]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.
And it seems to be that it's because of the NA
that gets returned...
> df$currency[df$currency == "USD"]
[1] USD <NA>
Levels: CAD EUR SDD USD
...because if I go back to my original df
and replace that NA
with some other currency, it would work -- but obviously I need to be able to keep NA
where it belongs.
I feel like there's some very good way to do this, but I'm missing it despite reading up on factors and trying different things. Help?
CodePudding user response:
case_when
doesn't do the type conversion automatically - i.e. currency
is factor
whereas the returns from other conditions in case_when
is just character
. So, we can force convert the currency
to character
to make all the returns same class and it should work
library(dplyr)
df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
), currency_converted = case_when(currency == "EUR" ~ "USD",
currency == "CAD" ~ "USD",
TRUE ~ as.character(currency)))
-output
# A tibble: 5 × 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <chr>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 <NA> 500 <NA>
5 E 500 SDD 500 SDD
If we want to keep it as a factor
, either wrap with factor
after the case_when
or directly use fct_recode
instead of case_when
library(forcats)
df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
), currency_converted = fct_recode(currency, USD = "EUR", USD = "CAD"))
-output
# A tibble: 5 × 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <fct>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 <NA> 500 <NA>
5 E 500 SDD 500 SDD