Home > Software design >  R: reset values in data frame to zero based on vector with column indices
R: reset values in data frame to zero based on vector with column indices

Time:11-04

I have a data frame with integers, like so:

# generate data frame
df = cbind(c(0,102,0,40,0,0), c(22,0,0,0,12,4), c(23,101,55,0,0,0),
           c(0,0,0,414,0,0), c(0,0,61,0,0,112), c(0,0,0,0,20,0))
colnames(df) = c('A', 'T', 'C', 'G', 'N', 'Del')
rownames(df) = c('Pos1', 'Pos2', 'Pos3', 'Pos4', 'Pos5', 'Pos6')
df
           A  T   C   G   N Del
    Pos1   0 22  23   0   0   0
    Pos2 102  0 101   0   0   0
    Pos3   0  0  55   0  61   0
    Pos4  40  0   0 414   0   0
    Pos5   0 12   0   0   0  20
    Pos6   0  4   0   0 112   0

I also have a vector with integers (which correspond to column indices of df):

# generate vector
cols = c(2,3,5,4,6,5)

Now, I want to reset all integers in df to zero that are present in columns with column indices that are listed in the vector, row-by-row. For example, for the first row I want to reset column 2 to zero, for the second row I want to reset column 3 to zero, etc.

I solved this with the following piece of code:

for (i in c(1:nrow(df))) {
    ncol = cols[[i]]
    df[[i, ncol]] = 0
    df
}
df
   
    A  T  C G N Del
    Pos1   0  0 23 0 0   0
    Pos2 102  0  0 0 0   0
    Pos3   0  0 55 0 0   0
    Pos4  40  0  0 0 0   0
    Pos5   0 12  0 0 0   0
    Pos6   0  4  0 0 0   0

As you can see, my code behaves as intended. However, it turns out to be very inefficient on large datasets. I therefore wondered whether there is an alternative that will be considerably faster than using a for-loop.

Note that it looks like I am resetting the maximum value in each row, but this is not the case as in some instances, it is the smaller of the two values that I am resetting to zero. So I cannot simply reset the min or max in each row to zero.

CodePudding user response:

You can use cbind to create a matrix of row and column positions and replace those with 0 as follows.

rows <- seq_len(nrow(df))
df[cbind(rows, cols)] <- 0

Result

df
#       A  T  C G N Del
#Pos1   0  0 23 0 0   0
#Pos2 102  0  0 0 0   0
#Pos3   0  0 55 0 0   0
#Pos4  40  0  0 0 0   0
#Pos5   0 12  0 0 0   0
#Pos6   0  4  0 0 0   0

CodePudding user response:

One solution involving dplyr could be:

df <- as.data.frame(df)
df %>%
 mutate(across(everything(), 
               ~ replace(., cols == match(cur_column(), names(cur_data())), 0)))

       A  T  C G N Del
Pos1   0  0 23 0 0   0
Pos2 102  0  0 0 0   0
Pos3   0  0 55 0 0   0
Pos4  40  0  0 0 0   0
Pos5   0 12  0 0 0   0
Pos6   0  4  0 0 0   0
  • Related