Home > Enterprise >  Filtering columns based on conditions related to IDs
Filtering columns based on conditions related to IDs

Time:12-02

I have the following data.frame

df = data.frame(plot = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2),
                tree = c("1", "1", "1", "1", "2", "2", 
                            "3", "4", "7", "7"),
                trunk = c("1", "2", "3", "4", "1", "2", 
                             "1", "1", "1", "2"),
                name = c("A", "A", "A", "A", "A", "A",
                         "B", "C", "A", "A"),
                time_1 = c("alive", "alive", "dead", "dead",
                           "alive", "alive",
                           "alive",
                           "alive",
                           "dead", "dead"),
                time_2 = c("dead", "alive", "dead", "dead",
                             "dead", "dead",
                             "dead",
                             "dead",
                             "dead", "dead"))

To rapidly explain the context, I have for each plot a number of tree and each tree can have a single trunk or multiples trunk. What I'm trying to do is keep only tree that have time_1 == "alive" and time_2 == "dead". A tree can have multiple "dead" trunk, but if a single trunk is alive, then I consider the tree to be "alive".

So, the first thing I did was add some identifiers for each tree and trunk:

#Adding an ID for each trunk in each plot
df$trunk_id <- paste(df$plot, "_",
                   df$tree, "_",
                   df$trunk,
                   sep = "")

#Adding an ID for each tree in each plot  
df$tree_id <- paste(df$plot, "_",
                    df$tree,
                    sep = "")                

Then, I was filtering only cases where the time_1 == "alive" and time_2 == "dead".

df2 <- df %>% filter(time_1 == "alive" & time_2 == "dead")

However, I noticed that this would not return exactly what I wanted. For example, looking at df when compared to df2, I know for a fact that I don't want plot == 1 and tree_id == "1_1" because at least one of the trunk is "alive" (see bold above). And filtering like that would not remove these cases.

What type of condition should I add to consider the entirety of the time_1 when related to each tree with multiple trunk?

My ideal output would be these IDs, so I'd be able to filter out what is irrelevant

output <- c("1_2", "2_3", "2_4")

CodePudding user response:

You can try adding all() in your condition, i.e.

library(dplyr)

df %>% 
 group_by(plot, tree) %>% 
 filter(all(time_1 == 'alive') & all(time_2 == 'dead'))

# A tibble: 4 × 6
# Groups:   plot, tree [3]
   plot tree  trunk name  time_1 time_2
  <dbl> <chr> <chr> <chr> <chr>  <chr> 
1     1 2     1     A     alive  dead  
2     1 2     2     A     alive  dead  
3     2 3     1     B     alive  dead  
4     2 4     1     C     alive  dead  
  • Related