I am struggling to write an R code for the following problem: df1 and df2 are two dataframes.
> df1 <- read.csv(file = 'Indx.csv')
> df1
St_Name I1 I2 I3 I4
1 TN 0.10 0.15 0.20 0.25
2 AZ 0.30 0.35 0.40 0.45
3 TX 0.50 0.55 0.60 0.65
4 KS 0.70 0.75 0.80 0.85
5 KY 0.90 0.95 0.11 0.12
6 MN 0.13 0.14 0.16 0.17
> df2 <- as.data.frame(fromJSON(file = "NewIndx.json"))
> df2
St_Name I1 I3
1 KS 100 200
# The output should be
> df1
St_Name I1 I2 I3 I4
1 TN 0.10 0.15 0.20 0.25
2 AZ 0.30 0.35 0.40 0.45
3 TX 0.50 0.55 0.60 0.65
4 KS 100 0.75 200 0.85
5 KY 0.90 0.95 0.11 0.12
6 MN 0.13 0.14 0.16 0.17
>
what is the optimal code to achieve this?
CodePudding user response:
This is not optimal, but it's one way to get what you want.
If you have data.table
package installed and don't mind installing a light package:
install.packages("kim")
library(kim)
df3 <- merge_data_tables(df2, df1, "St_Name")
df3 <- order_rows_specifically_in_dt(df3, "St_Name", df1[, St_Name])
data.table::setcolorder(df3, names(df1))
df1 <- df3
df1
CodePudding user response:
We could use this slightly modified function coalesce_join
provided by Edward Visel:
library(tidyverse)
# the function:
coalesce_join <- function(x, y,
by = NULL, suffix = c(".y", ".x"),
join = dplyr::full_join, ...) {
joined <- join(y, x, by = by, suffix = suffix, ...)
# names of desired output
cols <- union(names(y), names(x))
to_coalesce <- names(joined)[!names(joined) %in% cols]
suffix_used <- suffix[ifelse(endsWith(to_coalesce, suffix[1]), 1, 2)]
# remove suffixes and deduplicate
to_coalesce <- unique(substr(
to_coalesce,
1,
nchar(to_coalesce) - nchar(suffix_used)
))
coalesced <- purrr::map_dfc(to_coalesce, ~dplyr::coalesce(
joined[[paste0(.x, suffix[1])]],
joined[[paste0(.x, suffix[2])]]
))
names(coalesced) <- to_coalesce
dplyr::bind_cols(joined, coalesced)[cols]
}
# apply
coalesce_join(df1, df2, by = 'St_Name')
St_Name I1 I3 I2 I4
1 KS 100.00 200.00 0.75 0.85
2 TN 0.10 0.20 0.15 0.25
3 AZ 0.30 0.40 0.35 0.45
4 TX 0.50 0.60 0.55 0.65
5 KY 0.90 0.11 0.95 0.12
6 MN 0.13 0.16 0.14 0.17
CodePudding user response:
Kindly let me know if this is what you were anticipating.
library(tidyr)
id<- "St_Name"
df_1<- melt(df_1, id.vars = id, measure.vars = setdiff(colnames(df_1),id))
df_2 <- melt(df_2, id.vars = id, measure.vars = setdiff(colnames(df_2),id))
result <- merge(df_1,df_2, by=c("St_Name","variable"),no.dups = TRUE,all.x = TRUE)
result$value.x[which(!is.na(result$value.y))]<- result$value.y[which(!is.na(result$value.y))]
result <- result[,-4]
result <-spread(result, variable, value.x)