Suppose I have the following dataset:
id = 1:100
var_1 = rnorm(100,100,100)
var_2 = rnorm(100,100,100)
var_3 = rnorm(100,100,100)
var_4 = rnorm(100,100,100)
my_data = data.frame(id, var_1, var_2, var_3, var_4)
- I want to randomly replace 25 of the SAME entries in var_1 and var_2 with NA
- I want to randomly replace 20 of the SAME entries in var_3 and var_4 with NA.
I found the following code that can replace entries in a column with NA:
my_data$var_1[sample(nrow(my_data),25)]<-NA
my_data$var_2[sample(nrow(my_data),25)]<-NA
my_data$var_3[sample(nrow(my_data),20)]<-NA
my_data$var_4[sample(nrow(my_data),20)]<-NA
But is there someway to ensure that the same entries in var_1 and var_2 are replaced with NA, and the same entries in var_3 and var_4 are replace with NA?
Reference:
CodePudding user response:
Like this?
library(data.table)
setDT(my_data)
my_data[sample(.N, 25), c('var_1', 'var_2'):=NA]
my_data[sample(.N, 25), c('var_3', 'var_4'):=NA]
my_data[1:10]
## id var_1 var_2 var_3 var_4
## 1: 1 37.35462 37.963332 NA NA
## 2: 2 118.36433 104.211587 268.88733 -4.729815
## 3: 3 16.43714 8.907835 258.65884 297.133739
## 4: 4 259.52808 115.802877 66.90922 61.636789
## 5: 5 NA NA -128.52355 265.414530
## 6: 6 17.95316 276.728727 349.76616 251.221269
## 7: 7 148.74291 171.670748 NA NA
## 8: 8 173.83247 191.017423 154.13273 156.722091
## 9: 9 157.57814 138.418536 98.66005 -2.454848
## 10: 10 69.46116 268.217608 151.01084 132.300650
CodePudding user response:
You may use Map
, especially helpful if the number of columns you want to replace grow.
set.seed(42)
my_data[-1] <- Map(\(x, y) {my_data[sample.int(nrow(my_data), y), x] <- NA; my_data[x]},
list(c('var_1', 'var_2'), c('var_3', 'var_4')), c(25, 20)) |> as.data.frame()
head(my_data)
# id var_1 var_2 var_3 var_4
# 1 1 237.09584 220.0965 -100.09292 99.53792
# 2 2 43.53018 204.4751 NA NA
# 3 3 NA NA NA NA
# 4 4 163.28626 284.8482 NA NA
# 5 5 NA NA -37.68616 85.35274
# 6 6 89.38755 110.5514 NA NA