Home > Blockchain >  Count first entrance to a dataframe in R
Count first entrance to a dataframe in R

Time:01-14

I would like to sum all first observation of an individual in a dataframe by year. Individuals are identified by the variable "id". I would like an answer using dplyr.

Data

year id
1  1984  1
2  1985  1
3  1986  1
4  1987  1
5  1988  1
6  1985  2
7  1986  2
8  1987  2
9  1988  2
10 1985  3
11 1986  3
12 1986  4
13 1987  4
14 1988  4

Desired output

year2 entrance
1  1984        0
2  1985        2
3  1986        1
4  1987        0
5  1988        0

Nothing I have tried worked so far...

CodePudding user response:

Using weights in count:

library(dplyr)
df %>% 
  mutate(wt = c(1, diff(id) != 0)) %>% 
  count(id, year, wt = wt) %>% 
  group_by(year) %>% 
  summarise(n = sum(n))

# A tibble: 5 × 2
   year     n
  <int> <dbl>
1  1984     1
2  1985     2
3  1986     1
4  1987     0
5  1988     0

Or with complete:

df %>% 
  group_by(id) %>% 
  slice_min(year) %>% 
  ungroup() %>% 
  count(year, name = "entrance") %>% 
  tidyr::complete(year = min(df$year):max(df$year), fill = list(entrance = 0))
  • Related