Home > Blockchain >  Concatenate two dataframes and replacing NA values in R and transform the result in a csv file
Concatenate two dataframes and replacing NA values in R and transform the result in a csv file

Time:10-20

Hello I have two daframes in R, and I would like to concatenate them. The structure of the dfs its like that:

x <- data.frame(
  ID = c(1, 2, 3),
  S1 = c(10, NA, NA),
  S2 = c(21, 22, 23)
)

y <- data.frame(
  ID = c(1, 2, 3, 4),
  S1 = c(11, 12, 13, 14),
  S2 = c(24, 25, 26, 27)
)

And I would like to have something like:

final <- data.frame(
ID = c(1, 2, 3, 4, 5, 6, 7),
S1 = c(10, 11, 12, 13, 14, NA, NA),
S2 = c(21, 22, 23, 24, 25, 26, 27)
)

I've tried to use natural_join but it's give me a erro:

>library("rquery")

> final <- natural_join(ipeadata_d, ipeadata_d.cont, by = "ID",jointype = "FULL")
Error in natural_join.relop(dnodea, dnodeb, jointype = jointype, by = by,  : 
  rquery::natural_join.relop all tables must have all join keys, the following keys are not in some tables: ID

I Also tried the rbind, but the dataframe keeps the NA.

I would like to concatenate the dataframe like in the "final" example and would like to transform final in a csv file. Thanks for your help.

CodePudding user response:

There are a few issues here:

  • First - your example isn't reproducible because those data.frames do not have the same number of elements in each vector. I assume your ID vector should be of equal length to S1 and S2
  • Second: it sounds like you can accomplish what you want in base R, without any special functions. You are just attempting to concatenate or "union" the 2 data.frames. R uses the command rbind to do this.

I am making an assumption here on what your desired output is.

Here is a working example using rbind:

x <- data.frame(
  ID = c(1, 2, 3),
  S1 = c(10, NA, NA),
  S2 = c(21, 22, 23)
)

y <- data.frame(
  ID = c(4, 5, 6, 7),
  S1 = c(11, 12, 13, 14),
  S2 = c(24, 25, 26, 27)
)

final <- rbind(x,y)

> rbind(x,y)
  ID S1 S2
1  1 10 21
2  2 NA 22
3  3 NA 23
4  4 11 24
5  5 12 25
6  6 13 26
7  7 14 27

For your reference, "merging" usually refers to combining 2 data.frames based on a shared column or key.

CodePudding user response:

You may combine the two datasets using bind_rows and sort the columns putting NA's at the last.

library(dplyr)

bind_rows(x, y) %>%
  mutate(ID = row_number(), 
         across(c(S1, S2), sort, na.last = TRUE))

#  ID S1 S2
#1  1 10 21
#2  2 11 22
#3  3 12 23
#4  4 13 24
#5  5 14 25
#6  6 NA 26
#7  7 NA 27
  • Related