I'm a beginner to R, but I have the below dataframe with more observations in which I have at max each 'id' observation for three years 91, 99, 07.
I want to create a variable avg_ln_rd by 'id' that takes the average of 'ln_rd' and 'ln_rd' from year 91 if the first ln_rd observation is from 99 - and from year 99 if the first ln_rd observation is from 07.
id year ln_rd
<dbl> <dbl> <dbl>
1 1013 1991 3.51
2 1013 1999 5.64
3 1013 2007 4.26
4 1021 1991 0.899
5 1021 1999 0.791
6 1021 2007 0.704
7 1034 1991 2.58
8 1034 1999 3.72
9 1034 2007 4.95
10 1037 1991 0.262
I also already dropped any observations of 'id' that only exist for one of the three years.
My first thought was to create for each year a standalone variable for ln_rd but then i still would need to filter by id which i do not know how to do.
Then I tried using these standalone variables to form an if clause.
df$lagln_rd_99 <- ifelse(df$year == 1999, df$ln_rd_91, NA)
But again I do not know how to keep 'id' fixed.
Any help would be greatly appreciated.
EDIT:
I grouped by id using dplyr. Can I then just sort my df by id and create a new variable that is ln_rd but shifted by one row?
CodePudding user response:
Still a bit unclear what to do if all years are present in a group but this might help
library(dplyr)
df %>%
group_by(id) %>%
arrange(id, year) %>%
mutate(avg91 = mean(c(ln_rd[year == 1991], ln_rd[year == 1999])),
avg99 = mean(c(ln_rd[year == 1999], ln_rd[year == 2007])),
avg91 = ifelse(any(year == 1991), avg91, NA),
avg99 = ifelse(any(year == 2007), avg99, NA)) %>%
ungroup()
# A tibble: 15 × 5
year id ln_rd avg91 avg99
<int> <int> <dbl> <dbl> <dbl>
1 1991 3505 3.38 3.09 NA
2 1999 3505 2.80 3.09 NA
3 1991 4584 1.45 1.34 NA
4 1999 4584 1.22 1.34 NA
5 1991 5709 1.90 2.13 2.74
6 1999 5709 2.36 2.13 2.74
7 2007 5709 3.11 2.13 2.74
8 2007 9777 2.36 NA 2.36
9 1991 18729 4.82 5.07 5.42
10 1999 18729 5.32 5.07 5.42
11 2007 18729 5.53 5.07 5.42
12 1991 20054 0.588 0.307 NA
13 1999 20054 0.0266 0.307 NA
14 1999 62169 1.91 NA 1.68
15 2007 62169 1.45 NA 1.68