I have a dataset which has tree measurements at time t1 and t2. These trees are identified by state, county, plot and tree number. There are some trees that have died and some have grown in the time interval between t1 and t2.
State County Plot Tree Meas_yr
1 9 1 1 t1
1 9 1 2 t1
1 9 1 3 t1
1 9 1 1 t2
1 9 1 2 t2
1 9 1 4 t2
1 9 1 5 t2
I am trying to separate out the trees the new grown trees from the trees that were present in both t1 and t2 and the trees that have died in t2. I am trying to create something like below.
State County Plot Tree Meas_yr tree_survival
1 9 1 1 t1 1
1 9 1 2 t1 1
1 9 1 3 t1 0
1 9 1 1 t2 1
1 9 1 2 t2 1
I would really appreciate the help. Thanks in advance.
CodePudding user response:
You may try this -
library(dplyr)
df %>%
group_by(State, County, Plot, Tree) %>%
dplyr::filter(n() > 1 | Meas_yr == "t1") %>%
mutate(tree_survival = as.integer(n() > 1)) %>%
ungroup()
# State County Plot Tree Meas_yr tree_survival
# <int> <int> <int> <int> <chr> <int>
#1 1 9 1 1 t1 1
#2 1 9 1 2 t1 1
#3 1 9 1 3 t1 0
#4 1 9 1 1 t2 1
#5 1 9 1 2 t2 1
This keeps all the rows for "t1" year and also the trees that are present in both the year. tree_survival
is assigned 1 if the tree is present in both the years or 0 otherwise.
data
df <- structure(list(State = c(1L, 1L, 1L, 1L, 1L, 1L, 1L), County = c(9L,
9L, 9L, 9L, 9L, 9L, 9L), Plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L),
Tree = c(1L, 2L, 3L, 1L, 2L, 4L, 5L), Meas_yr = c("t1", "t1",
"t1", "t2", "t2", "t2", "t2")), class = "data.frame", row.names = c(NA, -7L))
CodePudding user response:
library(tidyverse)
data <- tribble(
~State, ~County, ~Plot, ~Tree, ~Meas_yr,
1, 9, 1, 1, 't1',
1, 9, 1, 2, 't1',
1, 9, 1, 3, 't1',
1, 9, 1, 1, 't2',
1, 9, 1, 2, 't2',
1, 9, 1, 4, 't2',
1, 9, 1, 5, 't2'
)
new_df <- data %>%
group_by(Tree) %>%
mutate(tree_survival = n() - 1) %>%
ungroup() %>%
filter(!(Meas_yr == 't2' & tree_survival == 0))