Home > Software engineering >  r - replacing multiple column values in selected rows based on another df
r - replacing multiple column values in selected rows based on another df

Time:05-18

I am looking for an iterative way (e.g. for loop, purrr) to replace the values of some cols for a selected row of a dataframe with the values of similar cols from another dataframe.

# Here is a toy df
df <- dplyr::tibble( 
  "id" = c("id_a", "id_b", "id_c"), 
  "subj_1" = c("subj_1_a", "subj_1_b",  "subj_1_c"), 
  "subj_2" = c("subj_2_a", "subj_2_b","subj_2_c"), 
  "subj_3" =  c("subj_3_a", "subj_3_b","subj_3_c") 
  )

#   id    subj_1   subj_2   subj_3  
#   <chr> <chr>    <chr>    <chr>   
# 1 id_a  subj_1_a subj_2_a subj_3_a
# 2 id_b  subj_1_b subj_2_b subj_3_b
# 3 id_c  subj_1_c subj_2_c subj_3_c

Here is a datafame with the values I would like to replace in columns:

df_replace <- dplyr::tibble(
  "id" = "id_a",  
  "subj_1" = "subj_1_a_REP",
  "subj_2" = "subj_2__REP",
  "subj_3" = "subj_2__REP"
)

Here is the result I seek (obtained one-by-one):

df2 <- df
df2$subj_1[df2$id == "id_a"] <- df_replace$subj_1
df2$subj_2[df2$id == "id_a"] <- df_replace$subj_2
df2$subj_3[df2$id == "id_a"] <- df_replace$subj_3

#   id    subj_1       subj_2      subj_3     
#   <chr> <chr>        <chr>       <chr>      
# 1 id_a  subj_1_a_REP subj_2__REP subj_2__REP
# 2 id_b  subj_1_b     subj_2_b    subj_3_b   
# 3 id_c  subj_1_c     subj_2_c    subj_3_c 

How can I obtain the same iterating with loop or purrr?

My attempt that is not working - I believe it some indexing issues

col_subj_names <- names(df_replace)[-1] # without id

# pick the id 
i <- "id_a"

# for loop (NOT WORKING!)
for (j in seq_along(col_subj_names)) {
  col <-  col_subj_names[j]
    df %>%
       dplyr::filter(id == i) %>%
       dplyr::mutate(., col = df_replace$col)
    df3 <- df
}

I cannot understand the Warning message either:

Unknown or uninitialized column: col.

CodePudding user response:

You don't need a loop here at all. You can replace the entire row in one go.

df[df$id %in% df_replace$id,] <- df_replace

df
#> # A tibble: 3 x 4
#>   id    subj_1       subj_2      subj_3     
#>   <chr> <chr>        <chr>       <chr>      
#> 1 id_a  subj_1_a_REP subj_2__REP subj_2__REP
#> 2 id_b  subj_1_b     subj_2_b    subj_3_b   
#> 3 id_c  subj_1_c     subj_2_c    subj_3_c 

Note that this will still work even if you have multiple id_a rows - they will all be replaced appropriately with the single line of code.

CodePudding user response:

Try rows_update(df_replace):

library(tidyverse)

df_replace <- tibble(
  "id" = "id_a",  
  "subj_1" = "subj_1_a_REP",
  "subj_2" = "subj_2__REP",
  "subj_3" = "subj_2__REP"
)

tibble( 
  "id" = c("id_a", "id_b", "id_c"), 
  "subj_1" = c("subj_1_a", "subj_1_b",  "subj_1_c"), 
  "subj_2" = c("subj_2_a", "subj_2_b","subj_2_c"), 
  "subj_3" =  c("subj_3_a", "subj_3_b","subj_3_c") 
) |> 
  rows_update(df_replace)
#> Matching, by = "id"
#> # A tibble: 3 × 4
#>   id    subj_1       subj_2      subj_3     
#>   <chr> <chr>        <chr>       <chr>      
#> 1 id_a  subj_1_a_REP subj_2__REP subj_2__REP
#> 2 id_b  subj_1_b     subj_2_b    subj_3_b   
#> 3 id_c  subj_1_c     subj_2_c    subj_3_c

Created on 2022-05-17 by the reprex package (v2.0.1)

  • Related