I have first column as my variable as rows and their respective values across different column where it can be positive or negative. I would like to filter it based on positive and negative values.
My small subset dataframe
structure(list(gene = c("SCML4", "RASGRP1", "RP1-47M23.3", "TIGIT",
"IL2RB", "IKZF3"), PC1 = c(0.0976999752508752, 0.0963683648774497,
0.0958379291214584, 0.095581364305455, 0.0953187100695565, 0.0952640683198088
), PC2 = c(0.0415177491122262, 0.0149616407858333, 0.0592932173696311,
0.0490135176285661, 0.0666662088855938, 0.0652039968982664),
PC3 = c(-0.0480347151614553, -0.05574053153725, -0.04805364872616,
-0.0486181477818392, -0.0437832673958965, -0.0450981246281503
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
Data
gene PC1 PC2 PC3
<chr> <dbl> <dbl> <dbl>
1 SCML4 0.0977 0.0415 -0.0480
2 RASGRP1 0.0964 0.0150 -0.0557
3 RP1-47M23.3 0.0958 0.0593 -0.0481
4 TIGIT 0.0956 0.0490 -0.0486
5 IL2RB 0.0953 0.0667 -0.0438
6 IKZF3 0.0953 0.0652 -0.0451
I found a way to filter these this is the way
top_genes <- df %>%
# select only the PCs we are interested in
select(gene, PC3) %>%
# convert to a "long" format
pivot_longer(matches("PC"), names_to = "PC", values_to = "loading") %>%
# for each PC
group_by(PC) %>%
# arrange by descending order of loading
arrange(desc(abs(loading))) %>%
# take the 10 top rows
slice(1:10) %>%
# pull the gene column as a vector
pull(gene) %>%
# ensure only unique genes are retained
unique()
top_genes
Now to filter from PC1 or PC2 or PC3 individually which i would like to get as I want to segregate like that only to do that I will have to put
select(gene, PC3) or select(gene, PC3) or select(gene, PC3).
How do I do that so that I can filter each of the gene and top PC from each column separately in one go instead of putting each PC one by one ?
CodePudding user response:
If you remove the select
line your code does most of what you seem to be describing:
df_select <- df %>%
pivot_longer(matches("PC"), names_to = "PC", values_to = "loading") %>%
group_by(PC) %>%
arrange(desc(abs(loading))) %>%
slice(1:10) %>%
ungroup()
Result
# A tibble: 18 × 3
gene PC loading
<chr> <chr> <dbl>
1 SCML4 PC1 0.0977
2 RASGRP1 PC1 0.0964
3 RP1-47M23.3 PC1 0.0958
4 TIGIT PC1 0.0956
5 IL2RB PC1 0.0953
6 IKZF3 PC1 0.0953
7 IL2RB PC2 0.0667
8 IKZF3 PC2 0.0652
9 RP1-47M23.3 PC2 0.0593
10 TIGIT PC2 0.0490
11 SCML4 PC2 0.0415
12 RASGRP1 PC2 0.0150
13 RASGRP1 PC3 -0.0557
14 TIGIT PC3 -0.0486
15 RP1-47M23.3 PC3 -0.0481
16 SCML4 PC3 -0.0480
17 IKZF3 PC3 -0.0451
18 IL2RB PC3 -0.0438
If you want to create objects based on PC:
[code above] %>%
group_split(PC)
then you could get PC1 with df_select[[1]]
:
# A tibble: 6 × 3
gene PC loading
<chr> <chr> <dbl>
1 SCML4 PC1 0.0977
2 RASGRP1 PC1 0.0964
3 RP1-47M23.3 PC1 0.0958
4 TIGIT PC1 0.0956
5 IL2RB PC1 0.0953
6 IKZF3 PC1 0.0953
or use grps[[1]]$gene
to get
[1] "SCML4" "RASGRP1" "RP1-47M23.3" "TIGIT" "IL2RB" "IKZF3"
CodePudding user response:
Update: changed modify line: thanks to @Jon Spring (stolen from him):-)
I am still not sure if this is what you are looking for:
But if you try to sort / arrange each column to get the gene order in descending or ascending order depending on PC1 PC2 or PC3 then we could use group_split
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-c(gene)) %>%
group_split(name) %>%
modify(. %>% arrange(-abs(value))) %>%
bind_rows() %>%
select(-value) %>%
group_by(name) %>%
mutate(row = row_number()) %>%
pivot_wider(names_from=name, values_from = gene)
row PC1 PC2 PC3
<int> <chr> <chr> <chr>
1 1 SCML4 IL2RB RASGRP1
2 2 RASGRP1 IKZF3 TIGIT
3 3 RP1-47M23.3 RP1-47M23.3 RP1-47M23.3
4 4 TIGIT TIGIT SCML4
5 5 IL2RB SCML4 IKZF3
6 6 IKZF3 RASGRP1 IL2RB