Home > Software engineering >  Merging using index as key in R
Merging using index as key in R

Time:09-27

I have two data frames

df1: 
      01.2020 02.2020 03.2020
11190       4       1       2
12345       3       3       1
11323       1       2       2


df2
      08.2020 04.2020 09.2020
11190       1       2       2
12345       1       2       3
11324       1       2       2

Dummy Data -

df1 <- structure(list(`01.2020` = c(4L, 3L, 1L), `02.2020` = c(1L, 3L, 2L), `03.2020` = c(2L, 1L, 2L)), class = "data.frame", row.names = c("11190","12345", "11323"))                                                           

df2 <- structure(list(`08.2020` = c(1L, 1L, 1L), `04.2020` = c(2L, 2L,  2L), `09.2020` = c(2L, 3L, 2L)), class = "data.frame", row.names = c("11190", "12345", "11324"))

I want to "outer merge" these two dataframes by key = index

How can we do that? what should be there in the place of by=

merge(x = sheet1_UN, y = sheet2_UN, by = "" , all = TRUE)

I want my final dataframe to look something like this

      01.2020 02.2020 03.2020  08.2020 04.2020  09.2020
11190       4       1       2     1      1        2
12345       3       3       1     1      2        3
11323       1       2       2     -      -        -
11324       -       -       -     1      2        2

Thanks in advance.

CodePudding user response:

This should do:

df1 %>% rownames_to_column('id') %>% 
full_join(df2 %>% rownames_to_column('id'), by='id')

output:

     id 01.2020 02.2020 03.2020 08.2020 04.2020 09.2020
1 11190       4       1       2       1       2       2
2 12345       3       3       1       1       2       3
3 11323       1       2       2      NA      NA      NA
4 11324      NA      NA      NA       1       2       2

You might use replace_na('-') if you want no NA values, like this:

df1 %>% rownames_to_column('id') %>% 
  full_join(df2 %>% rownames_to_column('id'), by='id') %>%
  mutate(across(everything(), ~.x %>% as.character %>% replace_na('-')))

CodePudding user response:

another method

df3 <- merge(df1, df2, by = "row.names", all = TRUE)

output:

Row.names 01.2020 02.2020 03.2020 08.2020 04.2020 09.2020
1     11190       4       1       2       1       2       2
2     11323       1       2       2      NA      NA      NA
3     11324      NA      NA      NA       1       2       2
4     12345       3       3       1       1       2       3
  • Related