this is the second time asking a similar question because i have not found the result i am looking for: I have the following dataframe:
gene = c("a","b","c","d","e","f","g","h","i","j","k", "a","b","c","d","e","f","g","h","i","j","k", "a","b","c","d","e","f","g","h","i","j","k")
sample1 = c("a","a","a","a","a","a","a","a","a","a", "a","b","b","b","b","b","b","b","b","b","b","b","c","c","c","c","c","c","c","c","c","c","c")
expression1 = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24","25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "36")
data_frame(gene, sample1, expression1)
and I have a following dataframe
gene = c("a","b","c","d","e","f","g","h","i","j","k")
sample2 = c("g","g","g","g","g","g","g","g","g","g","g")
expression2 = c("14.7", "15", "17", "16", "18", "20", "21", "22", "23", "24", "25")
gene sample2 expression2
<chr> <chr> <chr>
1 a g 14.7
2 b g 15
3 c g 17
4 d g 16
5 e g 18
6 f g 20
7 g g 21
8 h g 22
9 i g 23
10 j g 24
11 k g 25
and the result i am looking for is that I get a match between sample2 = g && sample1 = b, because they are most similar in gene expression. how Should I approach this.
it will look something like this:
gene sample2 expression2 sample1 expression1
<chr> <chr> <chr> <chr> <chr>
1 a g 14.7 b 14
2 b g 15 b 15
3 c g 17 b 16
4 d g 16 b 17
5 e g 18 b 18
6 f g 20 b 19
7 g g 21 b 20
8 h g 22 b 21
9 i g 23 b 22
10 j g 24 b 23
11 k g 25 b 24
CodePudding user response:
What about this?
library(data.table)
setDT(df1)[, expression := as.numeric(expression1)]
setDT(df2)[, expression := as.numeric(expression2)]
df1[df2, on = .(gene, expression), roll = "nearest"][, expression := NULL][]
# gene sample1 expression1 sample2 expression2
# 1: a b 14 g 14.7
# 2: b b 15 g 15
# 3: c b 16 g 17
# 4: d b 17 g 16
# 5: e b 18 g 18
# 6: f b 19 g 20
# 7: g b 20 g 21
# 8: h b 21 g 22
# 9: i b 22 g 23
# 10: j b 23 g 24
# 11: k b 24 g 25
CodePudding user response:
df2 %>%
right_join(df1, 'gene') %>%
group_by(sample1)%>%
type.convert(as.is = TRUE)%>%
mutate(nrm = norm(expression1 - expression2, "2"))%>%
ungroup()%>%
filter(min(nrm) == nrm)%>%
select(-nrm)
# A tibble: 11 × 5
gene sample2 expression2 sample1 expression1
<chr> <chr> <dbl> <chr> <int>
1 a g 14.7 b 14
2 b g 15 b 15
3 c g 17 b 16
4 d g 16 b 17
5 e g 18 b 18
6 f g 20 b 19
7 g g 21 b 20
8 h g 22 b 21
9 i g 23 b 22
10 j g 24 b 23
11 k g 25 b 24