I have a dataframe
df= data.frame(a=c(56,23,15,10),
b=c(43,NA,90.7,30.5),
c=c(12,7,10,2),
d=c(1,2,3,4),
e=c(NA,45,2,NA))
I want to select two random non-NA row values from each row and convert the rest to NA
Required Output- will differ because of randomness
df= data.frame(
a=c(56,NA,15,NA),
b=c(43,NA,NA,NA),
c=c(NA,7,NA,2),
d=c(NA,NA,3,4),
e=c(NA,45,NA,NA))
Code Used
I know to select random non-NA value from specific rows
set.seed(2)
sample(which(!is.na(df[1,])),2)
But no idea how to apply it all dataframe and get the required output
CodePudding user response:
You may write a function to keep n
random values in a row.
keep_n_value <- function(x, n) {
x1 <- which(!is.na(x))
x[-sample(x1, n)] <- NA
x
}
Apply the function by row using base R -
set.seed(123)
df[] <- t(apply(df, 1, keep_n_value, 2))
df
# a b c d e
#1 NA NA 12 1 NA
#2 NA NA 7 2 NA
#3 NA 90.7 10 NA NA
#4 NA 30.5 NA 4 NA
Or if you prefer tidyverse
-
purrr::pmap_df(df, ~keep_n_value(c(...), 2))
CodePudding user response:
Base R:
You could try column wise apply (sapply
) and randomly replace two non-NA values to be NA, like:
as.data.frame(sapply(df, function(x) replace(x, sample(which(!is.na(x)), 2), NA)))
Example Output:
a b c d e
1 56 NA 12 NA NA
2 23 NA NA 2 NA
3 NA NA 10 3 NA
4 NA 30.5 NA NA NA
CodePudding user response:
One option using dplyr
and purrr
could be:
df %>%
mutate(pmap_dfr(across(everything()), ~ `[<-`(c(...), !seq_along(c(...)) %in% sample(which(!is.na(c(...))), 2), NA)))
a b c d e
1 56 43.0 NA NA NA
2 23 NA 7 NA NA
3 15 NA NA NA 2
4 NA 30.5 2 NA NA