Home > Net >  Segregate rows based on positive and negative values across the different columns
Segregate rows based on positive and negative values across the different columns

Time:07-28

I have first column as my variable as rows and their respective values across different column where it can be positive or negative. I would like to filter it based on positive and negative values.

My small subset dataframe

structure(list(gene = c("SCML4", "RASGRP1", "RP1-47M23.3", "TIGIT", 
"IL2RB", "IKZF3"), PC1 = c(0.0976999752508752, 0.0963683648774497, 
0.0958379291214584, 0.095581364305455, 0.0953187100695565, 0.0952640683198088
), PC2 = c(0.0415177491122262, 0.0149616407858333, 0.0592932173696311, 
0.0490135176285661, 0.0666662088855938, 0.0652039968982664), 
    PC3 = c(-0.0480347151614553, -0.05574053153725, -0.04805364872616, 
    -0.0486181477818392, -0.0437832673958965, -0.0450981246281503
    )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

Data

gene           PC1    PC2     PC3
  <chr>        <dbl>  <dbl>   <dbl>
1 SCML4       0.0977 0.0415 -0.0480
2 RASGRP1     0.0964 0.0150 -0.0557
3 RP1-47M23.3 0.0958 0.0593 -0.0481
4 TIGIT       0.0956 0.0490 -0.0486
5 IL2RB       0.0953 0.0667 -0.0438
6 IKZF3       0.0953 0.0652 -0.0451

I found a way to filter these this is the way

top_genes <- df %>% 
  # select only the PCs we are interested in
  select(gene, PC3) %>%
  # convert to a "long" format
  pivot_longer(matches("PC"), names_to = "PC", values_to = "loading") %>% 
  # for each PC
  group_by(PC) %>% 
  # arrange by descending order of loading
  arrange(desc(abs(loading))) %>% 
  # take the 10 top rows
  slice(1:10) %>% 
  # pull the gene column as a vector
  pull(gene) %>% 
  # ensure only unique genes are retained
  unique()

top_genes

Now to filter from PC1 or PC2 or PC3 individually which i would like to get as I want to segregate like that only to do that I will have to put

  select(gene, PC3) or select(gene, PC3) or select(gene, PC3).

How do I do that so that I can filter each of the gene and top PC from each column separately in one go instead of putting each PC one by one ?

One of my enter image description here

CodePudding user response:

If you remove the select line your code does most of what you seem to be describing:

df_select <- df %>% 
  pivot_longer(matches("PC"), names_to = "PC", values_to = "loading") %>% 
  group_by(PC) %>% 
  arrange(desc(abs(loading))) %>% 
  slice(1:10) %>%
  ungroup()

Result

# A tibble: 18 × 3
   gene        PC    loading
   <chr>       <chr>   <dbl>
 1 SCML4       PC1    0.0977
 2 RASGRP1     PC1    0.0964
 3 RP1-47M23.3 PC1    0.0958
 4 TIGIT       PC1    0.0956
 5 IL2RB       PC1    0.0953
 6 IKZF3       PC1    0.0953
 7 IL2RB       PC2    0.0667
 8 IKZF3       PC2    0.0652
 9 RP1-47M23.3 PC2    0.0593
10 TIGIT       PC2    0.0490
11 SCML4       PC2    0.0415
12 RASGRP1     PC2    0.0150
13 RASGRP1     PC3   -0.0557
14 TIGIT       PC3   -0.0486
15 RP1-47M23.3 PC3   -0.0481
16 SCML4       PC3   -0.0480
17 IKZF3       PC3   -0.0451
18 IL2RB       PC3   -0.0438

If you want to create objects based on PC:

[code above] %>%
group_split(PC)

then you could get PC1 with df_select[[1]]:

# A tibble: 6 × 3
  gene        PC    loading
  <chr>       <chr>   <dbl>
1 SCML4       PC1    0.0977
2 RASGRP1     PC1    0.0964
3 RP1-47M23.3 PC1    0.0958
4 TIGIT       PC1    0.0956
5 IL2RB       PC1    0.0953
6 IKZF3       PC1    0.0953

or use grps[[1]]$gene to get

[1] "SCML4"       "RASGRP1"     "RP1-47M23.3" "TIGIT"       "IL2RB"       "IKZF3" 

CodePudding user response:

Update: changed modify line: thanks to @Jon Spring (stolen from him):-)

I am still not sure if this is what you are looking for:

But if you try to sort / arrange each column to get the gene order in descending or ascending order depending on PC1 PC2 or PC3 then we could use group_split

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-c(gene)) %>% 
  group_split(name) %>% 
   modify(. %>% arrange(-abs(value))) %>% 
  bind_rows() %>% 
  select(-value) %>% 
  group_by(name) %>% 
  mutate(row = row_number()) %>% 
  pivot_wider(names_from=name, values_from = gene) 
       row PC1         PC2         PC3        
  <int> <chr>       <chr>       <chr>      
1     1 SCML4       IL2RB       RASGRP1    
2     2 RASGRP1     IKZF3       TIGIT      
3     3 RP1-47M23.3 RP1-47M23.3 RP1-47M23.3
4     4 TIGIT       TIGIT       SCML4      
5     5 IL2RB       SCML4       IKZF3      
6     6 IKZF3       RASGRP1     IL2RB 
  •  Tags:  
  • r
  • Related