Home > Blockchain >  Add a binary column to indicate presence of an item in another variable
Add a binary column to indicate presence of an item in another variable

Time:09-16

I have a dataset which has tree measurements at time t1 and t2. These trees are identified by state, county, plot and tree number. There are some trees that have died in the time interval between t1 and t2.

State   County   Plot    Tree     Meas_yr
1       9        1       1        t1 
1       9        1       2        t1
1       9        1       3        t1
1       9        1       1        t2
1       9        1       2        t2

I am trying to create a binary label which gives 1 to trees if they are present in both t1 and t2 and 0 to trees if they are present in t1 but not present in t2. I am hoping to create something like this.

State   County   Plot    Tree     Meas_yr  tree_survival
1       9        1       1        t1       1
1       9        1       2        t1       1
1       9        1       3        t1       0
1       9        1       1        t2       1
1       9        1       2        t2       1

I would really appreciate the help. Thanks in advance.

CodePudding user response:

We could use

library(dplyr)
df1 %>%
    group_by(State, County, Plot, Tree) %>%
    mutate(available =  ('t2' %in% Meas_yr))

CodePudding user response:

df %>%
  group_by(State, County, Plot, Tree)%>%
  mutate(availble =  (!all(Meas_yr == 't1')))

# A tibble: 6 x 6
# Groups:   State, County, Plot, Tree [4]
  State County  Plot  Tree Meas_yr availble
  <int>  <int> <int> <int> <chr>      <int>
1     1      9     1     1 t1             1
2     1      9     1     2 t1             1
3     1      9     1     3 t1             0
4     1      9     1     1 t2             1
5     1      9     1     2 t2             1
6     1      9     1     4 t2             1

CodePudding user response:

If we want to check for the trees that survived between t1 and t2 the code below works.

df1 %>% 
  group_by(State, County, Plot, Tree) %>% 
  mutate(tree_survival =   all(c("t1", "t2") %in% Meas_yr))

But if you are only interested in trees that are alive at t2, then what akrun has works (i.e. ('t2' %in% Meas_yr)).

  • Related