Home > Back-end >  Join two dataframe based on common column in R
Join two dataframe based on common column in R

Time:10-07

I would like to join two data frames (df1, df2) based on the common first column, however, I want the placement of each column from both data frames one after another as shown in the example:

INPUT df1
df1 <- data.frame(name = c("JMA", "PSA", "TAN", "POS"), HAN = c("0.2119", "-0.1726", "0.0002", "-1.036"), ENA = c("0.0736", "-1.6644", "0.0743", "-0.0605"), IUA = c("-0.2212", "-0.3352", "0.6301", "-0.2848"), MEA = c("0.3464", "0.4085", "0.5801", "0.3236"))

Input df2
df2 <- data.frame(name = c("JMA", "PSA", "TAN", "POS"), YHD = c("0.3874", "-0.209", "-0.0614", "-1.8067"), PUH = c("0.0784", "-0.572", "0.0819", "-0.0628"), OUR = c("0.2297", "0.279", "0.3106", "0.3368" ), NKA = c("-0.5446", "-2.324", "-0.572", "-0.3626"))


Output df
df <- data.frame(name = c("JMA", "PSA", "TAN", "POS"), HAN = c(0.2119, -0.1726, 2e-04, -1.036), YHD = c(0.3874, -0.209, -0.0614, -1.8067 ), ENA = c(0.0736, -1.6644, 0.0743, -0.0605), PUH = c(0.0784, -0.572, 0.0819, -0.0628), IUA = c(-0.2212, -0.3352, 0.6301, -0.2848 ), OUR = c(0.2297, 0.279, 0.3106, 0.3368), MEA = c(0.3464, 0.4085, 0.5801, 0.3236), NKA = c(-0.5446, -2.324, -0.572, -0.3626))

I know the join function but do not know how to place columns from both dataframe one by one for further analysis.

Please help. Thank you in advance. :)

CodePudding user response:

Your desired df has 5 rows vs your two starting dataframes have 4 rows. I assume that is just a mistake.

This code should work to merge two dataframes based on a common column

df <- merge(df1,df2, by = "name")
df
  name     HAN     ENA     IUA    MEA     YHD     PUH    OUR     NKA
1  JMA  0.2119  0.0736 -0.2212 0.3464  0.3874  0.0784 0.2297 -0.5446
2  POS  -1.036 -0.0605 -0.2848 0.3236 -1.8067 -0.0628 0.3368 -0.3626
3  PSA -0.1726 -1.6644 -0.3352 0.4085  -0.209  -0.572  0.279  -2.324
4  TAN  0.0002  0.0743  0.6301 0.5801 -0.0614  0.0819 0.3106  -0.572

CodePudding user response:

library(dplyr)
df3 <- full_join(df1, df2, by="name")
cn3 <- colnames(df3)
df3 <- df3[,order(coalesce(match(cn3, names(df1)), match(cn3, names(df2))))]
identical(df, df3)
# [1] TRUE
df3
#   name     HAN     YHD     ENA     PUH     IUA    OUR    MEA     NKA
# 1  JMA  0.2119  0.3874  0.0736  0.0784 -0.2212 0.2297 0.3464 -0.5446
# 2  PSA -0.1726 -0.2090 -1.6644 -0.5720 -0.3352 0.2790 0.4085 -2.3240
# 3  TAN  0.0002 -0.0614  0.0743  0.0819  0.6301 0.3106 0.5801 -0.5720
# 4  POS -1.0360 -1.8067 -0.0605 -0.0628 -0.2848 0.3368 0.3236 -0.3626
  • Related