I made reproducible minimal example, but my real data is really huge
ac_1 <-c(0.1, 0.3, 0.03, 0.03)
ac_2 <-c(0.2, 0.4, 0.1, 0.008)
ac_3 <-c(0.8, 0.043, 0.7, 0.01)
ac_4 <-c(0.2, 0.73, 0.1, 0.1)
c_2<-c(1,2,5,23)
check_1<-c(0.01, 0.902,0.02,0.07)
check_2<-c(0.03, 0.042,0.002,0.00001)
check_3<-c(0.01, 0.02,0.5,0.001)
check_4<-c(0.001, 0.042,0.02,0.2)
id<-1:4
df<-data.frame(id,ac_1, ac_2,ac_3,ac_4,c_2,check_1,check_2,check_3,check_4)
so, the dataframe is like this:
> df
id ac_1 ac_2 ac_3 ac_4 c_2 check_1 check_2 check_3 check_4
1 1 0.10 0.200 0.800 0.20 1 0.010 0.03000 0.010 0.001
2 2 0.30 0.400 0.043 0.73 2 0.902 0.04200 0.020 0.042
3 3 0.03 0.100 0.700 0.10 5 0.020 0.00200 0.500 0.020
4 4 0.03 0.008 0.010 0.10 23 0.070 0.00001 0.001 0.200
and what I want to do is,
if check_1 is 0.02, I will make the corresponding ac_1 to be missing data. if check_2 is 0.02, I will make the corresponding ac_2 to be missing data. I will keep doing this every "check" and "ac"columns
For example, in the check_1 column, the 3th id person have 0.02. so, this person's ac_1 score should be missing data-- 0.03 should be missing data (NA)
In the check_3 column, the 2nd id person have 0.02. so, this person's ac_3 score should be missing data.
In the check_4 column, the 3th id person have 0.02 so, this person's ac_4 score should be missing data.
so. what i did is as follows:
for(i in 1:4){
if(paste0("df$check_",i)==0.02){
paste0("df$ac_",i)==NA
}
}
But, it did not work...
CodePudding user response:
You're really close, but you're off on a few fundamentals.
You can't (easily) use strings to refer to objects, so "df$check_1" won't work. You can use strings to refer to column names, but not with
$
, you need to use[
or[[
, sodf[["check_1"]]
will work.if
isn't vectorized, so it won't work on each value in a column. Useifelse
instead, or even better in this case we can skip theif
entirely.Using
==
on non-integer numbers is risky due to precision issues. We'll use a tolerance instead.Minor issue,
paste0("df$ac_",i)==NA
isn't good,==
is for checking equality. You need=
or<-
for assignment on that line.
Addressing all of these issues:
for(i in 1:4){
df[
## rows to replace
abs(df[[paste0("check_", i)]] - 0.02) < 1e-10,
## column to replace
paste0("ac_", i)
] <- NA
}
df
# id ac_1 ac_2 ac_3 ac_4 c_2 check_1 check_2 check_3 check_4
# 1 1 0.10 0.200 0.80 0.20 1 0.010 0.03000 0.010 0.001
# 2 2 0.30 0.400 NA 0.73 2 0.902 0.04200 0.020 0.042
# 3 3 NA 0.100 0.70 NA 5 0.020 0.00200 0.500 0.020
# 4 4 0.03 0.008 0.01 0.10 23 0.070 0.00001 0.001 0.200
CodePudding user response:
Its often better to work with long format data, even if just temporarily. Here is an example of doing so, using dplyr
and tidyr
:
pivot_longer(df, -c(id,c_2)) %>%
separate(name,into=c("type", "pos")) %>%
pivot_wider(names_from=type, values_from = value) %>%
mutate(ac=if_else(near(check,0.02), as.double(NA), ac)) %>%
pivot_wider(names_from = pos, values_from = ac:check)
(Updated with near()
thanks to Gregor)
Output:
id c_2 ac_1 ac_2 ac_3 ac_4 check_1 check_2 check_3 check_4
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.1 0.2 0.8 0.2 0.01 0.03 0.01 0.001
2 2 2 0.3 0.4 NA 0.73 0.902 0.042 0.02 0.042
3 3 5 NA 0.1 0.7 NA 0.02 0.002 0.5 0.02
4 4 23 0.03 0.008 0.01 0.1 0.07 0.00001 0.001 0.2