Home > Software engineering >  Repeat data by group per year
Repeat data by group per year

Time:01-03

I have a data frame with more than thousand rows like this.

org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil

And I would like to make my data like this. I want my data to be repeated after appearing.

org year country
a 2010 US
a 2011 US
a 2012 US
a 2013 US
...
...
a 2021 US
a 2012 UK
a 2013 UK
a 2014 UK
a 2015 UK
...
a 2021 UK
b 2014 Mexico
b 2015 Mexico
b 2016 Mexico
...
b 2021 Mexico
b 2014 CHile
b 2015 CHile
b 2016 CHile
...
b 2021 CHile
b 2015 Brazil
b 2016 Brazil
b 2017 Brazil
...
b 2021 Brazil

And I have tried the following code. It generated the whole year rather than years since the first appearance. Any suggestions would be thankful!

data <- data %>% 
  # expand all years by country
  group_by(org) %>% 
  expand(country, year = full_seq(year, 1)) %>% 
  ungroup() %>% 
  # join with original data to get X values
  left_join(data) %>% 
  # fill the missing country
  fill(country)

CodePudding user response:

How about something like this. Expand the year through map and then unnest:

library(tidyverse)

data <- read_table("org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil")

data |>
  mutate(year = map(year, ~seq(.x, 2021, 1))) |>
  unnest_longer(year)
#> # A tibble: 45 x 3
#>    org    year country
#>    <chr> <dbl> <chr>  
#>  1 a      2010 US     
#>  2 a      2011 US     
#>  3 a      2012 US     
#>  4 a      2013 US     
#>  5 a      2014 US     
#>  6 a      2015 US     
#>  7 a      2016 US     
#>  8 a      2017 US     
#>  9 a      2018 US     
#> 10 a      2019 US     
#> # ... with 35 more rows

CodePudding user response:

A bit counterintuitive, but you can “summarize” to more rows than the input, which is handy in this case:

library(dplyr)

max_yr <- 2021

data <- data %>% 
  group_by(org, country) %>%
  summarize(
    year = min(year):max_yr,
    .groups = "drop"
  )

print(data, n = 20)
# A tibble: 45 × 3
   org   country  year
   <chr> <chr>   <int>
 1 a     UK       2012
 2 a     UK       2013
 3 a     UK       2014
 4 a     UK       2015
 5 a     UK       2016
 6 a     UK       2017
 7 a     UK       2018
 8 a     UK       2019
 9 a     UK       2020
10 a     UK       2021
11 a     US       2010
12 a     US       2011
13 a     US       2012
14 a     US       2013
15 a     US       2014
16 a     US       2015
17 a     US       2016
18 a     US       2017
19 a     US       2018
20 a     US       2019
# … with 25 more rows
  • Related