I have a DF with 5 columns like so;
A B Date1 Date2 Date3 Date4
1 x NA NA NA
2 NA y NA NA
3 NA NA z NA
4 NA NA NA f
I want to use the dplyr package and the case_when() function to state something like this
df <- df %>%
mutate(B = case_when(
A == 1 ~ B == Date1,
A == 2 ~ B == Date2,
A == 3 ~ B == Date3,
A == 4 ~ B == Date4))
Essentially based on the value of A I would like to fill B with one of 4 date coloumns. A is of class character, B and the Date are all class Date.
Problem is when I apply this to the dataframe it simply doesn't work. It returns NAs and changes the class of B to boolean. I am using R version 4.1.2. Any help is appreciated.
CodePudding user response:
You can use coalesce()
to find first non-missing element.
library(dplyr)
df %>%
mutate(B = coalesce(!!!df[-1]))
# A Date1 Date2 Date3 Date4 B
# 1 1 x <NA> <NA> <NA> x
# 2 2 <NA> y <NA> <NA> y
# 3 3 <NA> <NA> z <NA> z
# 4 4 <NA> <NA> <NA> f f
The above code is just a shortcut of
df %>%
mutate(B = coalesce(Date1, Date2, Date3, Date4))
If the B
needs to be filled based on the value of A
, then here is an idea with c_across()
:
df %>%
rowwise() %>%
mutate(B = c_across(starts_with("Date"))[A]) %>%
ungroup()
# # A tibble: 4 × 6
# A Date1 Date2 Date3 Date4 B
# <int> <chr> <chr> <chr> <chr> <chr>
# 1 1 x NA NA NA x
# 2 2 NA y NA NA y
# 3 3 NA NA z NA z
# 4 4 NA NA NA f f
CodePudding user response:
The other answers are superior, but if you must use your current code for the actual application, the corrected version is:
df %>%
mutate(B = case_when(
A == 1 ~ Date1,
A == 2 ~ Date2,
A == 3 ~ Date3,
A == 4 ~ Date4))
Output:
# A B Date1 Date2 Date3 Date4
# 1 x x <NA> <NA> <NA>
# 2 y <NA> y <NA> <NA>
# 3 z <NA> <NA> z <NA>
# 4 f <NA> <NA> <NA> f
CodePudding user response:
As it seems, you want diagonal values from columns with Date
, you can use diag
:
df$B <- diag(as.matrix(df[grepl("Date", colnames(df))]))
#[1] "x" "y" "z" "f"
Other answers (if you want to coalesce):
- With
max
:
df$B <- apply(df[2:5], 1, \(x) max(x, na.rm = T))
- With
c_across
:
df %>%
rowwise() %>%
mutate(B = max(c_across(Date1:Date4), na.rm = T))
output
A Date1 Date2 Date3 Date4 B
1 1 x <NA> <NA> <NA> x
2 2 <NA> y <NA> <NA> y
3 3 <NA> <NA> z <NA> z
4 4 <NA> <NA> <NA> f f