I have a data frame with more than thousand rows like this.
org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil
And I would like to make my data like this. I want my data to be repeated after appearing.
org year country
a 2010 US
a 2011 US
a 2012 US
a 2013 US
...
...
a 2021 US
a 2012 UK
a 2013 UK
a 2014 UK
a 2015 UK
...
a 2021 UK
b 2014 Mexico
b 2015 Mexico
b 2016 Mexico
...
b 2021 Mexico
b 2014 CHile
b 2015 CHile
b 2016 CHile
...
b 2021 CHile
b 2015 Brazil
b 2016 Brazil
b 2017 Brazil
...
b 2021 Brazil
And I have tried the following code. It generated the whole year rather than years since the first appearance. Any suggestions would be thankful!
data <- data %>%
# expand all years by country
group_by(org) %>%
expand(country, year = full_seq(year, 1)) %>%
ungroup() %>%
# join with original data to get X values
left_join(data) %>%
# fill the missing country
fill(country)
CodePudding user response:
How about something like this. Expand the year through map
and then unnest
:
library(tidyverse)
data <- read_table("org year country
a 2010 US
a 2012 UK
b 2014 Mexico
b 2014 CHile
b 2015 Brazil")
data |>
mutate(year = map(year, ~seq(.x, 2021, 1))) |>
unnest_longer(year)
#> # A tibble: 45 x 3
#> org year country
#> <chr> <dbl> <chr>
#> 1 a 2010 US
#> 2 a 2011 US
#> 3 a 2012 US
#> 4 a 2013 US
#> 5 a 2014 US
#> 6 a 2015 US
#> 7 a 2016 US
#> 8 a 2017 US
#> 9 a 2018 US
#> 10 a 2019 US
#> # ... with 35 more rows
CodePudding user response:
A bit counterintuitive, but you can “summarize” to more rows than the input, which is handy in this case:
library(dplyr)
max_yr <- 2021
data <- data %>%
group_by(org, country) %>%
summarize(
year = min(year):max_yr,
.groups = "drop"
)
print(data, n = 20)
# A tibble: 45 × 3
org country year
<chr> <chr> <int>
1 a UK 2012
2 a UK 2013
3 a UK 2014
4 a UK 2015
5 a UK 2016
6 a UK 2017
7 a UK 2018
8 a UK 2019
9 a UK 2020
10 a UK 2021
11 a US 2010
12 a US 2011
13 a US 2012
14 a US 2013
15 a US 2014
16 a US 2015
17 a US 2016
18 a US 2017
19 a US 2018
20 a US 2019
# … with 25 more rows