I would like to sum all first observation of an individual in a dataframe by year. Individuals are identified by the variable "id". I would like an answer using dplyr.
Data
year id
1 1984 1
2 1985 1
3 1986 1
4 1987 1
5 1988 1
6 1985 2
7 1986 2
8 1987 2
9 1988 2
10 1985 3
11 1986 3
12 1986 4
13 1987 4
14 1988 4
Desired output
year2 entrance
1 1984 0
2 1985 2
3 1986 1
4 1987 0
5 1988 0
Nothing I have tried worked so far...
CodePudding user response:
Using weights in count
:
library(dplyr)
df %>%
mutate(wt = c(1, diff(id) != 0)) %>%
count(id, year, wt = wt) %>%
group_by(year) %>%
summarise(n = sum(n))
# A tibble: 5 × 2
year n
<int> <dbl>
1 1984 1
2 1985 2
3 1986 1
4 1987 0
5 1988 0
Or with complete
:
df %>%
group_by(id) %>%
slice_min(year) %>%
ungroup() %>%
count(year, name = "entrance") %>%
tidyr::complete(year = min(df$year):max(df$year), fill = list(entrance = 0))