I have 5 columns of 5 different p-values. I am looking to mark in a new column when values in any of the 5 columns are present.
My data looks like this:
Gene pvalue_group1 pvalue_group2 pvalue_group3 pvalue_group4 pvalue_group5
Gene1 0.01 0.2 NA NA NA
Gene2 NA 0.001 NA NA NA
Gene3 0.05 0.7 0.09 0.2 0.001
I am looking to output a column that would look like this:
Gene pvalue_group1 pvalue_group2 pvalue_group3 pvalue_group4 pvalue_group5 Group
Gene1 0.01 0.2 NA NA NA Group1, Group2
Gene2 NA 0.001 NA NA NA Group2
Gene3 0.05 0.7 0.09 0.2 0.001 Group1, Group2, Group3, Group4, Group5
I've tried to create code based off similar questions, trying to use dplyr and mutate() but I haven't gotten very far, any help on what other functions I could try would be appreciated.
Example input data:
df <- structure(list(Gene = c("Gene1", "Gene2", "Gene3"), pvalue_group1 = c(0.01,
NA, 0.05), pvalue_group2 = c(0.2, 0.001, 0.7), pvalue_group3 = c(NA,
NA, 0.09), pvalue_group4 = c(NA, NA, 0.2), pvalue_group5 = c(NA,
NA, 0.001)), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
CodePudding user response:
You could try something like this:
library(tidyverse)
df %>%
pivot_longer(-1) %>%
filter(!is.na(value)) %>%
select(-value) %>%
group_by(Gene) %>%
summarise(Group = list(name)) %>%
select(Group) %>%
bind_cols(df) %>%
relocate(Group, .after = "pvalue_group5")
# # A tibble: 3 × 7
# Gene pvalue_group1 pvalue_group2 pvalue_group3 pvalue_group4 pvalue_group5 Group
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list>
# 1 Gene1 0.01 0.2 NA NA NA <chr [2]>
# 2 Gene2 NA 0.001 NA NA NA <chr [1]>
# 3 Gene3 0.05 0.7 0.09 0.2 0.001 <chr [5]>
Alternatively, you could replace the summarise
step with summarise(Group = paste(name, collapse = ", "))
for a string-like output:
df %>%
pivot_longer(-1) %>%
filter(!is.na(value)) %>%
select(-value) %>%
group_by(Gene) %>%
summarise(Group = paste(name, collapse = ", ")) %>%
select(Group) %>%
bind_cols(df) %>%
relocate(Group, .after = "pvalue_group5")
# # A tibble: 3 × 7
# Gene pvalue_group1 pvalue_group2 pvalue_group3 pvalue_group4 pvalue_group5 Group
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
# 1 Gene1 0.01 0.2 NA NA NA pvalue_group1, pvalue_group2
# 2 Gene2 NA 0.001 NA NA NA pvalue_group2
# 3 Gene3 0.05 0.7 0.09 0.2 0.001 pvalue_group1, pvalue_group2, pvalue_group3, pvalue_group4, pvalue_gro…
CodePudding user response:
Here is another option:
Main feature is to use cur_column()
. We check if across the columns the value is not NA, If so then put in column name .. and so on:
library(tidyverse)
df %>%
mutate(across(-1, ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
unite(Group, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(Group = str_replace_all(Group, 'pvalue_', ''))
Gene pvalue_group1 pvalue_group2 pvalue_group3 pvalue_group4 pvalue_group5 Group
1 Gene1 0.01 0.200 NA NA NA group1 group2
2 Gene2 NA 0.001 NA NA NA group2
3 Gene3 0.05 0.700 0.09 0.2 0.001 group1 group2 group3 group4 group5
CodePudding user response:
library(data.table)
setDT(df)
x <- names(df)[-1]
df[, test := apply( .SD, 1, \(i) paste(x[which(!is.na(i))], collapse=',') ), .SDcols=x]
CodePudding user response:
Another possible solution:
library(tidyverse)
df %>%
mutate(pmap_dfr(.[,-1], ~ list(Group = names(df[,-1])[!is.na(c(...))] %>%
str_remove_all("pvalue_") %>% str_c(collapse = ", "))))
#> Gene pvalue_group1 pvalue_group2 pvalue_group3 pvalue_group4 pvalue_group5
#> 1 Gene1 0.01 0.200 NA NA NA
#> 2 Gene2 NA 0.001 NA NA NA
#> 3 Gene3 0.05 0.700 0.09 0.2 0.001
#> Group
#> 1 group1, group2
#> 2 group2
#> 3 group1, group2, group3, group4, group5
Yet another possible solution:
library(tidyverse)
df %>%
rowwise() %>%
mutate(Group = str_c(names(df[-1])[!is.na(c_across(-Gene))], collapse=", ") %>%
str_remove_all("pvalue_")) %>%
ungroup
Output:
Gene pvalue_group1 pvalue_group2 pvalue_group3 pvalue_group4 pvalue_group5 Group
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Gene1 0.01 0.2 NA NA NA group1, group2
2 Gene2 NA 0.001 NA NA NA group2
3 Gene3 0.05 0.7 0.09 0.2 0.001 group1, group2, group3, group4, group5