Merging two dataframes in R and arranging rows based on certain conditions-CodePudding

I have two dataframes df1 and df2 which I have merged together into another dataframe df3

df1 <- data.frame(
  Name = c("A", "B", "C"),  
  Value = c(1, 2, 3),
  Method = c("Indirect"))

df2 <- data.frame(
  Name = c("A", "B"),  
  Value = c(4, 5),
  Method = c("Direct"))

df3 <- rbind(df1, df2)

So df3 looks something like this

Now I need to identify all the unique entries in the Name column (which is C in this case) and for each of the unique entries, a row is to be added which would have the same "Name" but "Value" would be 0 and the "Method" would be the opposite one. The output should look like this.

Finally the rows with similar "Name" are to be arranged one below the other.

I have a huge dataframe and I need to achieve the above mentioned outcome in the most efficient way in R. How do I proceed?

CodePudding user response：

One way

tmp=df3[!(df3$Name %in% df3$Name[duplicated(df3$Name)]),]
tmp$Value=0
tmp$Method=ifelse(tmp$Method=="Direct","Indirect","Direct")

  Name Value Method
3    C     0 Direct

you can now rbind this to your original data (and sort it).

CodePudding user response：

Please find another solution using data.table

Reprex

Code

library(data.table)
library(magrittr) # for the pipe!

setDT(df3)
df3 <- rbindlist(list(df3,
               df3[!(df3$Name %in% df3[duplicated(Name)]$Name)
                   ][, `:=` (Value = 0, Method = fifelse(Method == "Indirect", "Direct", "Indirect"))])) %>% 
  setorder(., Name)

Output

df3
#>    Name Value   Method
#> 1:    A     1 Indirect
#> 2:    A     4   Direct
#> 3:    B     2 Indirect
#> 4:    B     5   Direct
#> 5:    C     3 Indirect
#> 6:    C     0   Direct

^{Created on 2021-12-15 by the reprex package (v2.0.1)}

CodePudding user response：

I think that with 10,000 rows you will barely notice it:

library(dplyr)

df3 |>
    add_count(Name) |>
    filter(n == 1)  |>
    mutate(
        Value  = 0,
        Method = c(Indirect = 'Direct', Direct = 'Indirect')[Method],
        n      = NULL
    ) |>
    bind_rows(df3) |>
    arrange(Name, Value, Method)

#   Name Value   Method
# 1    A     1 Indirect
# 2    A     4   Direct
# 3    B     2 Indirect
# 4    B     5   Direct
# 5    C     0   Direct
# 6    C     3 Indirect