Home > Software engineering >  R: group by gene name across 2 columns
R: group by gene name across 2 columns

Time:03-24

I have a rather simple question ... for which I cannot find the answer:/

So I have the following dataframe Columns node.1 and node.2 contain gene names, each gene could be present in both columns and multiple times because each row node.1-node.2 indicates a link between 2 genes and the wTO column indicates the link strength I want to calculate for each gene the mean wto value, meaning that I need a function which can locate each gene across these 2 columns( node.1 and node.2) I was thinking to use group_by or aggregate, but I am struggling to find the right syntax in order to "search" each gene in both columns

I will appreciate nay help, Anna

 'data.frame':  13799 obs. of  4 variables:
 $ Node.1: Factor w/ 1220 levels "ENSG00000004399",..: 76 616 102 349 349 366 102 360 360 360 ...
 $ Node.2: Factor w/ 1200 levels "ENSG00000004399",..: 363 113 382 1031 1034 1034 117 434 1103 516 ...
 $ wTO   : num  0.441 0.602 0.631 0.606 0.6 0.533 0.618 -0.326 -0.357 -0.354 ...
 $ abswTO: num  0.441 0.602 0.631 0.606 0.6 0.533 0.618 0.326 0.357 0.354 ...

             Node.1          Node.2    wTO abswTO
1   ENSG00000107404 ENSG00000224459  0.441  0.441
2   ENSG00000242590 ENSG00000116809  0.602  0.602
3   ENSG00000116809 ENSG00000226526  0.631  0.631
4   ENSG00000221978 ENSG00000272084  0.606  0.606
5   ENSG00000221978 ENSG00000272478  0.600  0.600
6   ENSG00000224870 ENSG00000272478  0.533  0.533
7   ENSG00000116809 ENSG00000121905  0.618  0.618
8   ENSG00000224387 ENSG00000229537 -0.326  0.326
9   ENSG00000224387 ENSG00000285778 -0.357  0.357
10  ENSG00000224387 ENSG00000234184 -0.354  0.354
11  ENSG00000230402 ENSG00000285525  0.409  0.409
12  ENSG00000224459 ENSG00000270066  0.401  0.401
13  ENSG00000234184 ENSG00000270066 -0.319  0.319
14  ENSG00000221978 ENSG00000237781  0.593  0.593

CodePudding user response:

So for simplicity id the following is my dataframe

    node.1 node.2  wto
1      A      Z 0.20
2      B      A 1.00
3      D      F 3.00
4      F      W 0.80
5      R      A 0.90
6      C      D 0.66

I want to have the result calculate for gene A mean=( 0.2 1 0.9)/3

CodePudding user response:

how about something like this?

library('dplyr')

gene1 <- df[, c("Node.1", "wTO")]; colnames(gene1)[1] <- 'gene'
gene2 <- df[, c("Node.2", "wTO")]; colnames(gene2)[1] <- 'gene'

df_long = rbind(gene1, gene2)

df_long %>% 
  group_by(gene) %>% 
  summarise(mean_wTO=mean(wTO))

CodePudding user response:

df <- data.frame(
  stringsAsFactors = FALSE,
  node.1 = c("A", "B", "D", "F", "R", "C"),
  node.2 = c("Z", "A", "F", "W", "A", "D"),
  wto = c(0.2, 1, 3, 0.8, 0.9, 0.66)
)

library(tidyverse)
df %>% pivot_longer(-wto, values_to = "gene") %>% 
  group_by(gene) %>% 
  summarise(wto = mean(wto))

#> # A tibble: 8 x 2
#>   gene    wto
#>   <chr> <dbl>
#> 1 A      0.7 
#> 2 B      1   
#> 3 C      0.66
#> 4 D      1.83
#> 5 F      1.9 
#> 6 R      0.9 
#> 7 W      0.8 
#> 8 Z      0.2

Created on 2022-03-23 by the reprex package (v2.0.1)

  • Related