I have two data frames dat1 and dat2. I will like to compare the values of the similar coloumns in the two data frames and returned the highest value.
How can I compare value of two table and return highest value in r?
library(reshape2)
dat1 <- data.frame(sn=c("v1","v2"), A=c(1,0), B=c(0,1), C=c(0,0), D=c(1,0))
dat2 <- data.frame(sn=c("v1","v2", "v3"), A=c(1,0,1), C=c(1,0,1), B=c(0,0,1))
dat1:
sn A B C D
v1 1 0 0 1
v2 0 1 0 0
dat2:
sn A C B
v1 1 1 0
v2 0 0 0
v3 1 1 1
dt1 <- melt(dat1,"sn")
dt2 <- melt(dat2,"sn")
dt3 <- merge(dt1,dt2,by=c("sn","variable"))
dt3$value <- max(dt3$value.x, dt3$value.y)
#I got the following which is not correct.
dt3:
sn variable value.x value.y value
v1 A 1 1 1
v1 B 0 0 1
v1 C 0 1 1
v2 A 0 0 1
v2 B 1 0 1
v2 C 0 0 1
#I will like dt3 to return the following
dt3:
dt3
sn variable value.x value.y value
v1 A 1 1 1
v1 B 0 0 0
v1 C 0 1 1
v2 A 0 0 0
v2 B 1 0 1
v2 C 0 0 0
CodePudding user response:
Using data.table
library(data.table)
setDT(dat1)
setDT(dat2)
dt1 = melt(dat1, id.vars = "sn")
dt2 = melt(dat2, id.vars = "sn")
# Inner join
dt3 = merge(dt1, dt2, by = c("sn", "variable"))
dt3[, value := pmax(value.x, value.y)]
dt3
# Key: <sn, variable>
# sn variable value.x value.y value
# <char> <fctr> <num> <num> <num>
# 1: v1 A 1 1 1
# 2: v1 B 0 0 0
# 3: v1 C 0 1 1
# 4: v2 A 0 0 0
# 5: v2 B 1 0 1
# 6: v2 C 0 0 0
CodePudding user response:
Heres' a tidyverse solution. Note that the max values in group v3
for variable d
is -infinity because all values are NA
.
bind_rows(dat1, dat2) %>%
pivot_longer(
-sn,
names_to = "variable"
) %>%
group_by(sn, variable) %>%
summarise(max_value = max(value, na.rm = TRUE))
# A tibble: 12 × 3
# Groups: sn [3]
sn variable max_value
<chr> <chr> <dbl>
1 v1 A 1
2 v1 B 0
3 v1 C 1
4 v1 D 1
5 v2 A 0
6 v2 B 1
7 v2 C 0
8 v2 D 0
9 v3 A 1
10 v3 B 1
11 v3 C 1
12 v3 D -Inf
Warning message:
In max(value, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf