Home > Software engineering >  finding time range of variables in a panel dataset in R
finding time range of variables in a panel dataset in R

Time:06-02

I have a large cross country panel dataset. I am trying to find the time range of each variable in the set across countries.

Let's say the dataset looks like this:

mydata <- structure(list(country = c("A", "A", "A", "A", "A", "B", "B", 
"B", "B", "B", "C", "C", "C", "C", "C", "D", "D", "D", "D", "D"
), year = c(1980L, 1981L, 1982L, 1983L, 1984L, 1980L, 1981L, 
1982L, 1983L, 1984L, 1980L, 1981L, 1982L, 1983L, 1984L, 1980L, 
1981L, 1982L, 1983L, 1984L), ratio = c(NA, 0.295374242728576, 
0.39086487214081, 0.305486429622397, NA, NA, NA, 0.219852746929973, 
0.36993286316283, NA, NA, 0.275014761742204, 0.375281228218228, 
0.351254417048767, NA, NA, NA, NA, 0.281099080992863, NA), ratio2 = c(0.493465579114854, 
0.84949842300266, 0.880158872716129, 0.496730178780854, 0.621339708380401, 
0.405148166045547, 0.785535878129303, 0.783708103187382, 0.414490845240653, 
0.341050366312265, 0.559447590634227, 0.636841595172882, 0.443167371489108, 
0.132577145472169, 0.450228306651115, 0.239580681361258, 0.870582599751651, 
0.116267577186227, 0.596920453198254, 0.333549798093736)), row.names = c(NA, 
-20L), class = "data.frame")

I want the final output to look as follows:

structure(list(variable = c("ratio", "ratio2"), firstyear = c("1981", 
"1980"), lastyear = c("1983", "1984")), class = "data.frame", row.names = c(NA, 
-2L))

CodePudding user response:

A possible solution:

library(tidyverse)

mydata %>% 
  pivot_longer(c(-country, -year)) %>% 
  group_by(variable = name) %>% 
  filter(!is.na(value)) %>% 
  summarise(firstyear = min(year), lastyear = max(year))

#> # A tibble: 2 × 3
#>   variable firstyear lastyear
#>   <chr>        <int>    <int>
#> 1 ratio         1981     1983
#> 2 ratio2        1980     1984
  • Related