Let's say I have an example dataframe in the following format:
df <- data.frame( c(1,2,3,1,2,3,1,2,3),
c(3,3,3,2,2,2,1,1,1),
c(23,23,34,134,134,NA,45,NA,NA)
)
colnames(df) <- c("id", "year", "fte_wage")
df <- df[is.na(df$fte_wage) == FALSE,]
I want to create a binary variable (let's say, a column named "obs") if the individual was observed in the previous or not. I have tried the following:
library(dplyr)
df2 <-
df %>%
arrange(id, year) %>%
group_by(id) %>%
rowwise() %>%
mutate(obs = ifelse((lag(year) %in% df[df$id == id,]$year & year > lag(year)), 1, 0))
Which generates a column of only 0 values. If I remove the second condition the code works, but then it misinterprets the lag(year) command, as it takes values from different individuals as well.
My desired output would be a dataframe in the following format:
id | year | fte_wage | ob |
---|---|---|---|
1 | 1 | 23 | 0 |
1 | 2 | 23 | 1 |
1 | 3 | 43 | 1 |
2 | 1 | 54 | 0 |
2 | 2 | 32 | 1 |
3 | 1 | 56 | 0 |
CodePudding user response:
You can just group_by(id)
and then check if row_number()
is > 1
to see if it falls in repeating run or is alone.
library(tidyverse)
df <- data.frame("id" = c(1,2,3,1,2,3,1,2,3),
"year" = c(3,3,3,2,2,2,1,1,1),
"fte_wage" = c(23,23,34,134,134,NA,45,NA,NA))
df %>%
drop_na(fte_wage) %>%
arrange(id, year) %>%
group_by(id) %>%
mutate(obs = as.numeric(row_number() > 1))
#> # A tibble: 6 × 4
#> # Groups: id [3]
#> id year fte_wage obs
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 45 0
#> 2 1 2 134 1
#> 3 1 3 23 1
#> 4 2 2 134 0
#> 5 2 3 23 1
#> 6 3 3 34 0
Created on 2022-11-21 with reprex v2.0.2
CodePudding user response:
This is one approach using dplyr
without grouping.
library(dplyr)
df %>%
na.omit() %>%
arrange(id, year) %>%
mutate(obs = (lag(id, default=F) == id) * 1)
id year fte_wage obs
1 1 1 45 0
2 1 2 134 1
3 1 3 23 1
4 2 2 134 0
5 2 3 23 1
6 3 3 34 0
CodePudding user response:
You could use diff
in the following way:
library(dplyr)
df %>%
group_by(id) %>%
arrange(id, year) %>%
mutate(obs = (c(0, diff(year)) == 1L))
Output:
# A tibble: 6 x 4
# Groups: id [3]
id year fte_wage obs
<dbl> <dbl> <dbl> <dbl>
1 1 1 45 0
2 1 2 134 1
3 1 3 23 1
4 2 2 134 0
5 2 3 23 1
6 3 3 34 0