Home > Software design >  Convert part of the values in rows in a data frame
Convert part of the values in rows in a data frame

Time:02-11

I have a data frame which looks like this:

df

colA colB
0     0
1     1
0     1
0     1
0     1
1     0
0     0
1     1
0     1

I would like to convert a certain proportion of the 0 in colA to NA and a certain proportion of 1 in colB to NA

if I do this:

df["colA"][df["colA"] == 0] <- NA

all the 0 in columns A will be converted to NA, however I just want half of them to be converted

Similarly, for colB I want only 1/3 of the 1 to be converted:

df["colB"][df["colB"] == 1] <- NA

Expected output:

   colA colB
    0     0
    1     1
    NA    1
    0     1
    NA    1
    1     0
    0     0
    1     NA
    NA    NA

CodePudding user response:

One way

tmp=which(df["colA"]==0)
df$colA[sample(tmp,round(length(tmp)/2))]=NA

similar for colB

tmp=which(df["colB"]==1)
df$colB[sample(tmp,round(length(tmp)/3))]=NA

CodePudding user response:

You can use prodNA from the missForest package

set.seed(1)
library(missForest)
df[df$colA == 0, "colA"] <- prodNA(df[df$colA == 0, "colA", drop=F], noNA = 0.5)
df[df$colB == 1, "colB"] <- prodNA(df[df$colB == 1, "colB", drop=F], noNA = 1/3)
df

  colA colB
1   NA    0
2    1   NA
3    0   NA
4   NA    1
5   NA    1
6    1    0
7    0    0
8    1    1
9    0    1

CodePudding user response:

I'll contribute a tidyverse approach here.

library(tidyverse)

df %>% mutate(id_colA = ifelse(colA == 1,  NA, 1:n()),
              colA = ifelse(id_colA %in% sample(na.omit(id_colA), sum(!is.na(id_colA))/2), NA, colA),
              id_colB = ifelse(colB == 0, NA, 1:n()),
              colB = ifelse(id_colB %in% sample(na.omit(id_colB), sum(!is.na(id_colB))/3), NA, colB)) %>%
  select(-starts_with("id_"))
  • Related