I have a rather simple question ... for which I cannot find the answer:/
So I have the following dataframe Columns node.1 and node.2 contain gene names, each gene could be present in both columns and multiple times because each row node.1-node.2 indicates a link between 2 genes and the wTO column indicates the link strength I want to calculate for each gene the mean wto value, meaning that I need a function which can locate each gene across these 2 columns( node.1 and node.2) I was thinking to use group_by or aggregate, but I am struggling to find the right syntax in order to "search" each gene in both columns
I will appreciate nay help, Anna
'data.frame': 13799 obs. of 4 variables:
$ Node.1: Factor w/ 1220 levels "ENSG00000004399",..: 76 616 102 349 349 366 102 360 360 360 ...
$ Node.2: Factor w/ 1200 levels "ENSG00000004399",..: 363 113 382 1031 1034 1034 117 434 1103 516 ...
$ wTO : num 0.441 0.602 0.631 0.606 0.6 0.533 0.618 -0.326 -0.357 -0.354 ...
$ abswTO: num 0.441 0.602 0.631 0.606 0.6 0.533 0.618 0.326 0.357 0.354 ...
Node.1 Node.2 wTO abswTO
1 ENSG00000107404 ENSG00000224459 0.441 0.441
2 ENSG00000242590 ENSG00000116809 0.602 0.602
3 ENSG00000116809 ENSG00000226526 0.631 0.631
4 ENSG00000221978 ENSG00000272084 0.606 0.606
5 ENSG00000221978 ENSG00000272478 0.600 0.600
6 ENSG00000224870 ENSG00000272478 0.533 0.533
7 ENSG00000116809 ENSG00000121905 0.618 0.618
8 ENSG00000224387 ENSG00000229537 -0.326 0.326
9 ENSG00000224387 ENSG00000285778 -0.357 0.357
10 ENSG00000224387 ENSG00000234184 -0.354 0.354
11 ENSG00000230402 ENSG00000285525 0.409 0.409
12 ENSG00000224459 ENSG00000270066 0.401 0.401
13 ENSG00000234184 ENSG00000270066 -0.319 0.319
14 ENSG00000221978 ENSG00000237781 0.593 0.593
CodePudding user response:
So for simplicity id the following is my dataframe
node.1 node.2 wto
1 A Z 0.20
2 B A 1.00
3 D F 3.00
4 F W 0.80
5 R A 0.90
6 C D 0.66
I want to have the result calculate for gene A mean=( 0.2 1 0.9)/3
CodePudding user response:
how about something like this?
library('dplyr')
gene1 <- df[, c("Node.1", "wTO")]; colnames(gene1)[1] <- 'gene'
gene2 <- df[, c("Node.2", "wTO")]; colnames(gene2)[1] <- 'gene'
df_long = rbind(gene1, gene2)
df_long %>%
group_by(gene) %>%
summarise(mean_wTO=mean(wTO))
CodePudding user response:
df <- data.frame(
stringsAsFactors = FALSE,
node.1 = c("A", "B", "D", "F", "R", "C"),
node.2 = c("Z", "A", "F", "W", "A", "D"),
wto = c(0.2, 1, 3, 0.8, 0.9, 0.66)
)
library(tidyverse)
df %>% pivot_longer(-wto, values_to = "gene") %>%
group_by(gene) %>%
summarise(wto = mean(wto))
#> # A tibble: 8 x 2
#> gene wto
#> <chr> <dbl>
#> 1 A 0.7
#> 2 B 1
#> 3 C 0.66
#> 4 D 1.83
#> 5 F 1.9
#> 6 R 0.9
#> 7 W 0.8
#> 8 Z 0.2
Created on 2022-03-23 by the reprex package (v2.0.1)