Home > Mobile >  How to compare values of two table and return highest value in r?
How to compare values of two table and return highest value in r?

Time:11-14

I have two data frames dat1 and dat2. I will like to compare the values of the similar coloumns in the two data frames and returned the highest value.

How can I compare value of two table and return highest value in r?

library(reshape2)
dat1 <- data.frame(sn=c("v1","v2"), A=c(1,0), B=c(0,1), C=c(0,0), D=c(1,0))
dat2 <- data.frame(sn=c("v1","v2", "v3"), A=c(1,0,1), C=c(1,0,1), B=c(0,0,1))
dat1:

sn A B C D
v1 1 0 0 1
v2 0 1 0 0

dat2:

sn A C B
v1 1 1 0
v2 0 0 0
v3 1 1 1

dt1 <- melt(dat1,"sn")
dt2 <- melt(dat2,"sn")

dt3 <- merge(dt1,dt2,by=c("sn","variable"))

dt3$value <- max(dt3$value.x, dt3$value.y)

#I got the following which is not correct.
dt3:

sn variable value.x value.y value
v1        A       1       1     1
v1        B       0       0     1
v1        C       0       1     1
v2        A       0       0     1
v2        B       1       0     1
v2        C       0       0     1


#I will like dt3 to return the following
dt3:

dt3
sn variable value.x value.y value
v1        A       1       1     1
v1        B       0       0     0
v1        C       0       1     1
v2        A       0       0     0
v2        B       1       0     1
v2        C       0       0     0

CodePudding user response:

Using data.table

library(data.table)
setDT(dat1)
setDT(dat2)


dt1 = melt(dat1, id.vars = "sn")
dt2 = melt(dat2, id.vars = "sn")
# Inner join
dt3 = merge(dt1, dt2, by = c("sn", "variable"))

dt3[, value := pmax(value.x, value.y)]
dt3
# Key: <sn, variable>
#        sn variable value.x value.y value
#    <char>   <fctr>   <num>   <num> <num>
# 1:     v1        A       1       1     1
# 2:     v1        B       0       0     0
# 3:     v1        C       0       1     1
# 4:     v2        A       0       0     0
# 5:     v2        B       1       0     1
# 6:     v2        C       0       0     0

CodePudding user response:

Heres' a tidyverse solution. Note that the max values in group v3 for variable d is -infinity because all values are NA.

bind_rows(dat1, dat2) %>% 
  pivot_longer(
    -sn,
    names_to = "variable"
  ) %>% 
  group_by(sn, variable) %>% 
  summarise(max_value = max(value, na.rm = TRUE))

# A tibble: 12 × 3
# Groups:   sn [3]
   sn    variable max_value
   <chr> <chr>        <dbl>
 1 v1    A                1
 2 v1    B                0
 3 v1    C                1
 4 v1    D                1
 5 v2    A                0
 6 v2    B                1
 7 v2    C                0
 8 v2    D                0
 9 v3    A                1
10 v3    B                1
11 v3    C                1
12 v3    D             -Inf
Warning message:
In max(value, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf
  • Related