I am trying to recode a list of columns var1:var8 in df - "sampledf" where I am changing the values "B" and "D" into "0", but keeping the other values as it is.
sampledf <- data.frame(
var1 = c(1,4,2,1,1,0,0,1,0,0,0),
var2 = c(1,1,"D",1,0,0,1,"B",0,"D",0),
var3 = c(1,5,2,1,"B",0,1,1,1,0,0),
var4 = c(1,1,0,1,2,0,1,1,5,1,1),
var5 = c(0,4,"D",1,0,0,0,1,1,1,1),
var6 = c(1,"D",0,1,0,2,1,1,0,1,0),
var7 = c(1,1,0,0,1,"E",1,0,"D",1,1),
var8 = c(1,1,0,0,2,5,1,"D",0,3,1))
This is what I tried but did not work. Compared to this example, the other values I have in my real dataset is very very long. So I cannot manually supply all the values. All I want is just to change this and keep others as it is.
sampledfnew <- sampledf %>% mutate(across(var1:var2, ~recode(
.x,
'B'=0L,
'D'=0L,
TRUE ~ X,
)))
Can anyone help me fix the error here? Thank you
CodePudding user response:
There are many ways to do this. Using ifelse
-
library(dplyr)
change_values <- c('B', 'D')
sampledf %>% mutate(across(var1:var2, ~ifelse(.x %in% change_values, 0, .x)))
# var1 var2 var3 var4 var5 var6 var7 var8
#1 1 1 1 1 0 1 1 1
#2 4 1 5 1 4 D 1 1
#3 2 0 2 0 D 0 0 0
#4 1 1 1 1 1 1 0 0
#5 1 0 B 2 0 0 1 2
#6 0 0 0 0 0 2 E 5
#7 0 1 1 1 0 1 1 1
#8 1 0 1 1 1 1 0 D
#9 0 0 1 5 1 0 D 0
#10 0 0 0 1 1 1 1 3
#11 0 0 0 1 1 0 1 1
CodePudding user response:
Alternatives to ifelse
, since it is prone to at least two not-insignificant issues (class-dropping and class-ambiguity, discussed below).
sampledf %>%
mutate(
across(var1:var8, ~ if_else(
. %in% c("B", "D"),
if (is.character(.)) "0" else 0, # could also be maybechar(0, .) from below
.)
)
)
# var1 var2 var3 var4 var5 var6 var7 var8
# 1 1 1 1 1 0 1 1 1
# 2 4 1 5 1 4 0 1 1
# 3 2 0 2 0 0 0 0 0
# 4 1 1 1 1 1 1 0 0
# 5 1 0 0 2 0 0 1 2
# 6 0 0 0 0 0 2 E 5
# 7 0 1 1 1 0 1 1 1
# 8 1 0 1 1 1 1 0 0
# 9 0 0 1 5 1 0 0 0
# 10 0 0 0 1 1 1 1 3
# 11 0 0 0 1 1 0 1 1
In case you don't always want B/D to be replaced with the same value,
maybechar <- function(val, src) if (is.character(src)) as.character(val) else val
sampledf %>%
mutate(
across(var1:var8, ~ case_when(
. == "B" ~ maybechar(0, .),
. == "D" ~ maybechar(0, .),
TRUE ~ .)
)
)
Notes:
Most of the replacement being doing here is actually replacing with a
"0"
string instead of a0
integer, because most of your data is string.The use of
ifelse
by itself is something I often recommend against due to class ambiguity. It is feasible withifelse
to change the class of the return value without realizing it. See the difference betweenifelse(c(T,T), 1:2, c("A","B"))
and compare withifelse(c(T,F), 1:2, c("A","B"))
to see what I mean. This is "dangerous"/risky, and one thing thatif_else
explicitly guards against. (This also is enforced bycase_when
in my second code block.)It is because of the previous bullet that I suggested the use of something like
maybechar
, which might suggest a little sloppy code but at least is a little more declarative/intentional about it. I give two ways to do it: the first is explicitly without a helper function, shown in theif_else
example above, the second is with the helper function. It seems more prudent to use the helper function in the case ofcase_when
, since the operation is being doing multiple times, so the code is a little easier to read (imo).
CodePudding user response:
Another base R solution is:
sampledf[apply(sampledf, 2, \(x) x %in% c("B", "D"))] <- 0
> sampledf
var1 var2 var3 var4 var5 var6 var7 var8
1 1 1 1 1 0 1 1 1
2 4 1 5 1 4 0 1 1
3 2 0 2 0 0 0 0 0
4 1 1 1 1 1 1 0 0
5 1 0 0 2 0 0 1 2
6 0 0 0 0 0 2 E 5
7 0 1 1 1 0 1 1 1
8 1 0 1 1 1 1 0 0
9 0 0 1 5 1 0 0 0
10 0 0 0 1 1 1 1 3
11 0 0 0 1 1 0 1 1