I'm using this code to generate a random number of NAs within a dataframe. Here's an example
set.seed(1)
df <- mtcars[1:10,]
df <- as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.7, 0.3), size = length(cc), replace = TRUE) ]))
> df
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 NA 110 NA 2.620 NA 0 1 4 4
2 21.0 6 160.0 110 3.90 NA 17.02 NA NA 4 4
3 22.8 4 108.0 93 NA 2.320 18.61 1 1 4 1
4 NA 6 258.0 110 3.08 3.215 19.44 1 0 NA NA
5 18.7 NA 360.0 NA 3.15 3.440 17.02 0 NA NA 2
6 NA 6 225.0 105 NA 3.460 20.22 NA 0 NA 1
7 NA NA 360.0 NA 3.21 3.570 15.84 NA NA 3 4
8 24.4 NA 146.7 62 3.69 3.190 NA 1 0 4 2
9 22.8 4 NA NA NA 3.150 22.90 NA 0 NA NA
10 19.2 NA 167.6 123 3.92 3.440 NA NA 0 4 4
It's useful but NAs are inconsistent per column across the dataframe. I would like to have an exact number of NAs per column. Is there a way to create exactly 3 random NAs per column? Many thanks
CodePudding user response:
We may sample
the row_number()
to replace
the column with exact number of NA
s
library(dplyr)
df1 <- df %>%
mutate(across(everything(),
~ replace(.x, sample(row_number(), 3), NA)))
-output
df1
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 NA 160.0 NA 3.90 NA NA 0 1 NA 4
Mazda RX4 Wag 21.0 NA NA 110 3.90 2.875 17.02 0 NA 4 4
Datsun 710 22.8 4 NA NA 3.85 2.320 18.61 1 1 NA 1
Hornet 4 Drive NA 6 258.0 110 3.08 3.215 19.44 1 NA NA 1
Hornet Sportabout 18.7 NA 360.0 NA NA 3.440 NA NA 0 3 2
Valiant 18.1 6 225.0 105 NA 3.460 20.22 1 0 3 NA
Duster 360 NA 8 NA 245 3.21 NA 15.84 0 0 3 NA
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 NA 4 140.8 95 NA 3.150 22.90 NA NA 4 2
Merc 280 19.2 6 167.6 123 3.92 NA NA NA 0 4 NA
In base R
, we do the same step by looping over the columns with lapply
df[] <- lapply(df, \(x) replace(x, sample(seq_along(x), 3), NA))