Home > Software design >  How to update values of certain columns of a dataframe with values from another dataframe in r
How to update values of certain columns of a dataframe with values from another dataframe in r

Time:11-01

I am struggling to write an R code for the following problem: df1 and df2 are two dataframes.

> df1 <- read.csv(file = 'Indx.csv')
> df1
  St_Name   I1   I2   I3   I4
1      TN 0.10 0.15 0.20 0.25
2      AZ 0.30 0.35 0.40 0.45
3      TX 0.50 0.55 0.60 0.65
4      KS 0.70 0.75 0.80 0.85
5      KY 0.90 0.95 0.11 0.12
6      MN 0.13 0.14 0.16 0.17
> df2 <- as.data.frame(fromJSON(file = "NewIndx.json"))
> df2
  St_Name  I1  I3
1      KS 100 200

# The output should be

> df1
  St_Name   I1   I2   I3   I4
1      TN 0.10 0.15 0.20 0.25
2      AZ 0.30 0.35 0.40 0.45
3      TX 0.50 0.55 0.60 0.65
4      KS  100 0.75  200 0.85
5      KY 0.90 0.95 0.11 0.12
6      MN 0.13 0.14 0.16 0.17
>

what is the optimal code to achieve this?

CodePudding user response:

This is not optimal, but it's one way to get what you want.

If you have data.table package installed and don't mind installing a light package:

install.packages("kim")
library(kim)
df3 <- merge_data_tables(df2, df1, "St_Name")
df3 <- order_rows_specifically_in_dt(df3, "St_Name", df1[, St_Name])
data.table::setcolorder(df3, names(df1))
df1 <- df3
df1

CodePudding user response:

We could use this slightly modified function coalesce_join provided by Edward Visel:

library(tidyverse)
# the function:
coalesce_join <- function(x, y, 
                          by = NULL, suffix = c(".y", ".x"), 
                          join = dplyr::full_join, ...) {
  joined <- join(y, x, by = by, suffix = suffix, ...)
  # names of desired output
  cols <- union(names(y), names(x))
  
  to_coalesce <- names(joined)[!names(joined) %in% cols]
  suffix_used <- suffix[ifelse(endsWith(to_coalesce, suffix[1]), 1, 2)]
  # remove suffixes and deduplicate
  to_coalesce <- unique(substr(
    to_coalesce, 
    1, 
    nchar(to_coalesce) - nchar(suffix_used)
  ))
  
  coalesced <- purrr::map_dfc(to_coalesce, ~dplyr::coalesce(
    joined[[paste0(.x, suffix[1])]], 
    joined[[paste0(.x, suffix[2])]]
  ))
  names(coalesced) <- to_coalesce
  
  dplyr::bind_cols(joined, coalesced)[cols]
}

# apply 
coalesce_join(df1, df2, by = 'St_Name')
  St_Name     I1     I3   I2   I4
1      KS 100.00 200.00 0.75 0.85
2      TN   0.10   0.20 0.15 0.25
3      AZ   0.30   0.40 0.35 0.45
4      TX   0.50   0.60 0.55 0.65
5      KY   0.90   0.11 0.95 0.12
6      MN   0.13   0.16 0.14 0.17

CodePudding user response:

Kindly let me know if this is what you were anticipating.

library(tidyr)

id<- "St_Name"
df_1<- melt(df_1, id.vars = id, measure.vars = setdiff(colnames(df_1),id))
df_2 <- melt(df_2, id.vars = id, measure.vars = setdiff(colnames(df_2),id))
result <- merge(df_1,df_2, by=c("St_Name","variable"),no.dups = TRUE,all.x = TRUE)
result$value.x[which(!is.na(result$value.y))]<- result$value.y[which(!is.na(result$value.y))]
result <- result[,-4] 
result <-spread(result, variable, value.x)
  • Related