Home > Software engineering >  How to update multiple records in a data frame with new values from a separate data frame by index i
How to update multiple records in a data frame with new values from a separate data frame by index i

Time:06-29

I am trying to figure out how to update various records in df1 column A with new values from df2 column A. df1 has 2141 column A observations with a unique ID in theIndex column. df2 has 268 updated column A values with their associated unique ID in the Index column. I tried using merge(), or even a simple for loop with an if statement with no luck, such as:

for (i in 1:nrow(df1)){
    if (df1$Index[i] == df2$Index[i]){
        df1$A[i] <- df2$A[1]
    }
} 

A simplified example of my two data frames and what result I'm trying to achieve:

df1:
   Index  A
1      1 NA
2      2 NA
3      3 NA
4      4 NA
5      5 NA
6      6 NA
7      7 NA
8      8 NA
9      9 NA
10    10 NA
df2:
  Index  A
1     2 85
2     3 46
3     6 79
4     7 64
5    10 40
Updated df1:
   Index  A
1      1 NA
2      2 85
3      3 46
4      4 NA
5      5 NA
6      6 79
7      7 64
8      8 NA
9      9 NA
10    10 40

I have to believe this is simple, but I can't figure out how to update my main data frame. My search for ideas online keeps going back to the merge() function or similar join functions. Thank you for any help or guidance.

CodePudding user response:

Try this

df <- data.frame(Index = 1:10 , A = NA)
for(i in 1:nrow(df)){
    x <- which(i == df2$Index)
    y <- which(i == df1$Index)
    if(length(x) > 0) df$A[i] <- df2$A[x]
    else if(length(y) > 0) df$A[i] <- df1$A[y]
    else df$A[i] <- NA
    
}
  • output
   Index  A
1      1 NA
2      2 85
3      3 46
4      4 NA
5      5 NA
6      6 79
7      7 64
8      8 NA
9      9 NA
10    10 40

CodePudding user response:

In dplyr you can do:

df2 %>% 
  full_join(df1) %>% 
  group_by(Index) %>% 
  filter(ifelse((length(row_number()) > 1) & is.na(A), 1, 2) == 2) %>% 
  ungroup() %>% 
  arrange(-desc(Index))

Assuming that there is only a maximum of one unique Index value per df

# A tibble: 10 × 2
   Index     A
   <int> <int>
 1     1    NA
 2     2    85
 3     3    46
 4     4    NA
 5     5    NA
 6     6    79
 7     7    64
 8     8    NA
 9     9    NA
10    10    40
  • Related