Home > database >  R - How do I change values in a column of a data frame to NA based on the value in the next column,
R - How do I change values in a column of a data frame to NA based on the value in the next column,

Time:02-01

I am trying to set values in a data frame to NA based on the corresponding value in the next column.

Here's an example where I'm trying to update the values in col1 and col3 by comparing them to col2 and col4 respectively. If the value in col2 is less than 30, the value in col1 should be NA (likewise for col3 if the value in col4 is less than 30). Here's the console output of the example:

threshold <- 30L

df <- data.frame(col1 = floor(runif(4, 0, 100)),
                 col2 = floor(runif(4, 0, 100)),
                 col3 = floor(runif(4, 0, 100)),
                 col4 = floor(runif(4, 0, 100)))

#   col1 col2 col3 col4
# 1   84   71   18   52
# 2   42   89   25   19
# 3   93   17   28   59
# 4    6   21   88   35

df[, c(2, 4)] < threshold
#       col2  col4
# [1,] FALSE FALSE
# [2,] FALSE  TRUE
# [3,]  TRUE FALSE
# [4,]  TRUE FALSE

df_new <- data.frame(col1 = c(84, 42, NA, NA),
                     col2 = df$col2,
                     col3 = c(18, NA, 28, 88),
                     col4 = df$col4)

#   col1 col2 col3 col4
# 1   84   71   18   52
# 2   42   89   NA   19
# 3   NA   17   28   59
# 4   NA   21   88   35 

My real dataset has thousands of rows and hundreds of columns, so doing this mannually is not possible. I do need to loop over odd/even pairs of columns (col1 and col2, col3 and col4, etc.) as I have in the example. How do I do this?

CodePudding user response:

You can also use a loop if odd columns should be changed based on even columns:

ncol <- ncol(df)
for (i in 1:ncol) {
  if (i %% 2 == 1) {
     df[,i] <- ifelse(df[,i   1] < 30, NA, df[,i])
   }
 }

CodePudding user response:

You can try this vectorized approach across an unknown number of columns:

df[seq(1, ncol(df), 2)][df[seq(2, (ncol(df)), 2)] < threshold] <- NA

Outcome - note I expanded your data to 8 columns to test

#  col1 col2 col3 col4 col5 col6 col7 col8
#1   28   94   55   67   24   88   NA   28
#2   NA    4   45   57    4   69   NA   14
#3   40   52   NA   10   32   64   54   96
#4   88   89   45   89   95   99   59   90

Data

threshold <- 30L
set.seed(123)
df <- data.frame(col1 = floor(runif(4, 0, 100)),
                 col2 = floor(runif(4, 0, 100)),
                 col3 = floor(runif(4, 0, 100)),
                 col4 = floor(runif(4, 0, 100)),
                 col5 = floor(runif(4, 0, 100)),
                 col6 = floor(runif(4, 0, 100)),
                 col7 = floor(runif(4, 0, 100)),
                 col8 = floor(runif(4, 0, 100)))
  • Related