I have dataframe like below
monkey = data.frame(girl = 1:10, kn = NA, boy = 5)
And i want to understand the following code meaning step by step
monkey %>%
mutate(t = ifelse(is.na(kn),.[,grepl('a',names(.))],ll))
Thank you everyone in advance for your support.
CodePudding user response:
In my opinion, this is not good code, but I'll try to explain what it is doing.
is.na(kn)
(in the context ofmonkey
) returns a logical vector of whether each value in that column isNA
,with(monkey, is.na(kn)) # [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
The use of
.
in.[grepl(*)]
refers to the current data at the start of this call tomutate
; it would be more dplyr-canonical to usecur_data()
, which would be more-complete (e.g., taking into account previous mutated columns that.
does not recognize, not a factor here). I believe this.[*]
code is trying to select a column dynamically based on the current data.Why this one is bad: 1. There is no column here whose name contains
"a"
; 2. There could be more than one columns whose names contain"a"
, which means theyes=
argument toifelse
would produce a nested frame in the newt=
column; 3. The behavior of.[,*]
changes if the original frame is the base-Rdata.frame
or if it is the tibble-varianttbl_df
: seemonkey[,1]
versustibble(monkey)[,1]
.no=
argument refers to an objectll
that is not defined. This should (intuitively) fail withError: object 'll' not found
or similar, but since all of thetest=
argument is true, theno=
is not needed and so it not evaluated. Considerifelse(c(TRUE, TRUE), 1:2, stop("oops"))
(no error) versusifelse(c(TRUE, FALSE), 1:2, stop("oops"))
.
Ultimately, this code is not defensive-enough to be safe (base-vs-tibble variant) and its intent is unclear.
My advice when using dplyr
is to use dplyr::if_else
instead of base R's ifelse
. For one, ifelse
has some issues and limitations (e.g., How to prevent ifelse() from turning Date objects into numeric objects); for another, if_else
protects you from ambiguous, inconsistent-results code such as in your question.