I have a large cross country panel dataset. I am trying to find the time range of each variable in the set across countries.
Let's say the dataset looks like this:
mydata <- structure(list(country = c("A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "C", "C", "C", "C", "C", "D", "D", "D", "D", "D"
), year = c(1980L, 1981L, 1982L, 1983L, 1984L, 1980L, 1981L,
1982L, 1983L, 1984L, 1980L, 1981L, 1982L, 1983L, 1984L, 1980L,
1981L, 1982L, 1983L, 1984L), ratio = c(NA, 0.295374242728576,
0.39086487214081, 0.305486429622397, NA, NA, NA, 0.219852746929973,
0.36993286316283, NA, NA, 0.275014761742204, 0.375281228218228,
0.351254417048767, NA, NA, NA, NA, 0.281099080992863, NA), ratio2 = c(0.493465579114854,
0.84949842300266, 0.880158872716129, 0.496730178780854, 0.621339708380401,
0.405148166045547, 0.785535878129303, 0.783708103187382, 0.414490845240653,
0.341050366312265, 0.559447590634227, 0.636841595172882, 0.443167371489108,
0.132577145472169, 0.450228306651115, 0.239580681361258, 0.870582599751651,
0.116267577186227, 0.596920453198254, 0.333549798093736)), row.names = c(NA,
-20L), class = "data.frame")
I want the final output to look as follows:
structure(list(variable = c("ratio", "ratio2"), firstyear = c("1981",
"1980"), lastyear = c("1983", "1984")), class = "data.frame", row.names = c(NA,
-2L))
CodePudding user response:
A possible solution:
library(tidyverse)
mydata %>%
pivot_longer(c(-country, -year)) %>%
group_by(variable = name) %>%
filter(!is.na(value)) %>%
summarise(firstyear = min(year), lastyear = max(year))
#> # A tibble: 2 × 3
#> variable firstyear lastyear
#> <chr> <int> <int>
#> 1 ratio 1981 1983
#> 2 ratio2 1980 1984