I have a dataset which has tree measurements at time t1 and t2. These trees are identified by state, county, plot and tree number. There are some trees that have died in the time interval between t1 and t2.
State County Plot Tree Meas_yr
1 9 1 1 t1
1 9 1 2 t1
1 9 1 3 t1
1 9 1 1 t2
1 9 1 2 t2
I am trying to create a binary label which gives 1 to trees if they are present in both t1 and t2 and 0 to trees if they are present in t1 but not present in t2. I am hoping to create something like this.
State County Plot Tree Meas_yr tree_survival
1 9 1 1 t1 1
1 9 1 2 t1 1
1 9 1 3 t1 0
1 9 1 1 t2 1
1 9 1 2 t2 1
I would really appreciate the help. Thanks in advance.
CodePudding user response:
We could use
library(dplyr)
df1 %>%
group_by(State, County, Plot, Tree) %>%
mutate(available = ('t2' %in% Meas_yr))
CodePudding user response:
df %>%
group_by(State, County, Plot, Tree)%>%
mutate(availble = (!all(Meas_yr == 't1')))
# A tibble: 6 x 6
# Groups: State, County, Plot, Tree [4]
State County Plot Tree Meas_yr availble
<int> <int> <int> <int> <chr> <int>
1 1 9 1 1 t1 1
2 1 9 1 2 t1 1
3 1 9 1 3 t1 0
4 1 9 1 1 t2 1
5 1 9 1 2 t2 1
6 1 9 1 4 t2 1
CodePudding user response:
If we want to check for the trees that survived between t1
and t2
the code below works.
df1 %>%
group_by(State, County, Plot, Tree) %>%
mutate(tree_survival = all(c("t1", "t2") %in% Meas_yr))
But if you are only interested in trees that are alive at t2
, then what akrun has works (i.e. ('t2' %in% Meas_yr)
).