Home > Back-end >  Match Column to Column Names, add value to row/column of Matches
Match Column to Column Names, add value to row/column of Matches

Time:07-06

Match Column to Column Names, add value to row/column of Matches

My first question, so be sure to teach me a lesson

Given Data Frame

df<- structure(list(ID = c("ID001", "ID001", "ID003", "ID004", "ID003", 
                      "ID004"), ID001 = c(1L, 0L, 1L, 0L, 1L, 1L), ID002 = c(0L, 
                        0L, 0L, 0L, 0L, 0L), ID003 = c(1L, 0L, 1L, 1L, 0L, 0L), ID004 = c(1L, 
                           1L, 0L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA, -6L)) 



   ID     ID001 ID002 ID003 ID004
1 ID001     1     0     1     1
2 ID001     0     0     0     1
3 ID003     1     0     1     0
4 ID004     0     0     1     0
5 ID003     1     0     0     1
6 ID004     1     0     0     1

I have an inefficient for loop for updating entries where the 'ID' column is matching a column name, we add to the value

for(rows in 1:nrow(df)) {
  df[rows, match(df[rows,'ID'], names(df))] <- df[rows, match(df[rows,'ID'], names(df))]   1
}

df

     ID ID001 ID002 ID003 ID004
1 ID001     2     0     1     1
2 ID001     1     0     0     1
3 ID003     1     0     2     0
4 ID004     0     0     1     1
5 ID003     1     0     1     1
6 ID004     1     0     0     2

this is the desired output. But I need to run this on millions of rows and its slow. I'm guessing this can be improved more than one way, maybe with apply or similar, but I've not attempted this and hoping to see how its done.

CodePudding user response:

Here is a dplyr way:

library(dplyr)

df %>% 
  mutate(across(-ID, ~ifelse(ID == cur_column(), . 1, .)))
     ID ID001 ID002 ID003 ID004
1 ID001     2     0     1     1
2 ID001     1     0     0     1
3 ID003     1     0     2     0
4 ID004     0     0     1     1
5 ID003     1     0     1     1
6 ID004     1     0     0     2

CodePudding user response:

Create a matrix with row/column position index and then do the assignment

i1 <- cbind(seq_len(nrow(df)), match(df$ID, names(df)[-1]))
df[-1][i1] <- df[-1][i1]   1

-output

> df
     ID ID001 ID002 ID003 ID004
1 ID001     2     0     1     1
2 ID001     1     0     0     1
3 ID003     1     0     2     0
4 ID004     0     0     1     1
5 ID003     1     0     1     1
6 ID004     1     0     0     2
  • Related