I have an example stripped down dataset combined using this:
library(tidyverse)
composite <- inner_join(shared, taxonomy, by="otu") %>%
group_by(group, taxonomy) %>%
filter(count > 3)
The data looks like this:
group otu count taxonomy
<fct> <fct> <dbl> <chr>
1 17VSD otu001 4559 Escherichia-Shigella
2 17VSD otu002 870 Enterobacteriaceae_unclassified
3 17VSD otu020 63 Cupriavidus
4 17VSD otu072 24 Escherichia-Shigella
5 17VSD otu080 16 Escherichia-Shigella
6 17VSD otu205 4 Escherichia-Shigella
7 YG1 otu001 15 Escherichia-Shigella
8 YG1 otu002 15 Enterobacteriaceae_unclassified
9 YG1 otu004 504 Corynebacterium
10 YG1 otu006 500 Cutibacterium
In the group column there are 3 variables.
I'm having a lot of trouble with the syntax to get the next filter sequence. I want each group factor to have only the same factors under the otu column. In the data we can see row 9 - YG1 otu004 504 Corynebacterium. I would like to remove this row because group 17VSD does not have otu004. This will also have to be true for the other group (YG2). of course this will get very complicated with the full data set where there are 20 group factors and millions of otu factors.
I've tried expanding filter(count > 3 & ...)
but that doesn't seem to be the right direction. Otherwise just lots of searching through examples. My problem may also be because I'm not using the correct language to help solve the issue.
CodePudding user response:
As Ben had pointed out I can use n_distinct() to solve the issue. Thank you Ben!
to get what I needed for the dummy set:
composite <- inner_join(shared, taxonomy, by="otu") %>%
group_by(group, taxonomy) %>%
filter(count >3) %>%
group_by(otu) %>%
filter(n_distinct(group) == 3)
I also attempted Pablo Serrati's answer. It works, but its a little but complicated. Thank you Pablo
CodePudding user response:
I don't know if this is the simple way, but... you can reshape your data with pivot_
, then use this table to select columns with apply
, and make a new reshape.
# Example data
composite <- data.frame(group = rep(c("a", "b", "c"), c(9, 11, 5)),
otu = sample(paste0("otu00", 0:9),
size = 25, replace = T),
count = sample(1:20, size = 25, replace = T))
# Deleting possible duplicated group otu
composite <- composite[!duplicated(composite[c("group", "otu")]), ]
# reshape data
composite_reshape <- composite |>
arrange(group, otu) |>
pivot_wider(id_cols = "group", names_from = "otu", values_from = "count")
# Select complete otu columns
composite_reduce <- composite_reshape[!apply(composite_reshape , 2, anyNA)]
# New reshape
composite_final <- composite_reduce |>
pivot_longer(cols = -group )
composite_final