How to update values of certain columns of a dataframe with values from another dataframe in r-CodePudding

I am struggling to write an R code for the following problem: df1 and df2 are two dataframes.

> df1 <- read.csv(file = 'Indx.csv')
> df1
  St_Name   I1   I2   I3   I4
1      TN 0.10 0.15 0.20 0.25
2      AZ 0.30 0.35 0.40 0.45
3      TX 0.50 0.55 0.60 0.65
4      KS 0.70 0.75 0.80 0.85
5      KY 0.90 0.95 0.11 0.12
6      MN 0.13 0.14 0.16 0.17
> df2 <- as.data.frame(fromJSON(file = "NewIndx.json"))
> df2
  St_Name  I1  I3
1      KS 100 200

# The output should be

> df1
  St_Name   I1   I2   I3   I4
1      TN 0.10 0.15 0.20 0.25
2      AZ 0.30 0.35 0.40 0.45
3      TX 0.50 0.55 0.60 0.65
4      KS  100 0.75  200 0.85
5      KY 0.90 0.95 0.11 0.12
6      MN 0.13 0.14 0.16 0.17
>

what is the optimal code to achieve this?

CodePudding user response：

This is not optimal, but it's one way to get what you want.

If you have data.table package installed and don't mind installing a light package:

install.packages("kim")
library(kim)
df3 <- merge_data_tables(df2, df1, "St_Name")
df3 <- order_rows_specifically_in_dt(df3, "St_Name", df1[, St_Name])
data.table::setcolorder(df3, names(df1))
df1 <- df3
df1

CodePudding user response：

We could use this slightly modified function coalesce_join provided by Edward Visel:

library(tidyverse)
# the function:
coalesce_join <- function(x, y, 
                          by = NULL, suffix = c(".y", ".x"), 
                          join = dplyr::full_join, ...) {
  joined <- join(y, x, by = by, suffix = suffix, ...)
  # names of desired output
  cols <- union(names(y), names(x))
  
  to_coalesce <- names(joined)[!names(joined) %in% cols]
  suffix_used <- suffix[ifelse(endsWith(to_coalesce, suffix[1]), 1, 2)]
  # remove suffixes and deduplicate
  to_coalesce <- unique(substr(
    to_coalesce, 
    1, 
    nchar(to_coalesce) - nchar(suffix_used)
  ))
  
  coalesced <- purrr::map_dfc(to_coalesce, ~dplyr::coalesce(
    joined[[paste0(.x, suffix[1])]], 
    joined[[paste0(.x, suffix[2])]]
  ))
  names(coalesced) <- to_coalesce
  
  dplyr::bind_cols(joined, coalesced)[cols]
}

# apply 
coalesce_join(df1, df2, by = 'St_Name')

  St_Name     I1     I3   I2   I4
1      KS 100.00 200.00 0.75 0.85
2      TN   0.10   0.20 0.15 0.25
3      AZ   0.30   0.40 0.35 0.45
4      TX   0.50   0.60 0.55 0.65
5      KY   0.90   0.11 0.95 0.12
6      MN   0.13   0.16 0.14 0.17

CodePudding user response：

Kindly let me know if this is what you were anticipating.

library(tidyr)

id<- "St_Name"
df_1<- melt(df_1, id.vars = id, measure.vars = setdiff(colnames(df_1),id))
df_2 <- melt(df_2, id.vars = id, measure.vars = setdiff(colnames(df_2),id))
result <- merge(df_1,df_2, by=c("St_Name","variable"),no.dups = TRUE,all.x = TRUE)
result$value.x[which(!is.na(result$value.y))]<- result$value.y[which(!is.na(result$value.y))]
result <- result[,-4] 
result <-spread(result, variable, value.x)