Is there a straightforward way in dplyr to expand a dataframe by replicating a row based on a specific vector?
For example, I have following dataframe:
df <- tibble(Year=c(2019),
cat1=c("A","B"),
cat2=c("X","Y"),
value1=c(1,2),
value2=c(10,20))
selected_years <- c(2019:2021)
where I would like to replicate the row where cat1=="A"
for the years 2019-2021. The values of some columns (value1, cat) should be taken from the original year 2019, some other columns (value2) filled with NAs.
The final output should look like:
Year cat value1 value2
2019 A 1 10
2020 A 1 NA
2021 A 1 NA
2019 B 2 20
I tried with bind_rows()
...however, the result is not fully what I wanted (I only get the "A"-Part not the "B"-Part), and I am not sure if this is really the most intuitive/dplyr way to go, or if another approach (or even specific function) would be more reasonable:
df%>%
filter(cat1=="A",Year==2019)%>%
bind_rows(
data.frame(
Year=setdiff(selected_years,.$Year),
cat1=.$cat1,
value1=.$value1
)
)
)
Edit:
I also tried using expand
and right_join
, but I then my desired column values are not repeated:
df %>%
dplyr::right_join(df %>%
filter(cat1=="A",Year==2019)%>%
expand(Year=c(2019:2021)))
Maybe an approach involving case_when
?
CodePudding user response:
The part where you only want to keep specfic values and others not, makes this tricky. It is easy to expand the vector on all values using rowwise
and unnest
together with the condition in if_else
. In the last step we just reset the values to NA
which we don't want to replicate. If you have more than one value that you want to set NA
, we can use across
.
library(tidyverse)
df <- tibble(Year=c(2019),
cat1=c("A","B"),
cat2=c("X","Y"),
value1=c(1,2),
value2=c(10,20))
selected_years <- c(2019:2021)
df %>%
rowwise %>%
mutate(Year = if_else(cat1 == "A", list(selected_years), list(Year))) %>%
unnest(Year) %>%
mutate(value2 = if_else(Year != 2019, NA_real_, value2))
#> # A tibble: 4 x 5
#> Year cat1 cat2 value1 value2
#> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 2019 A X 1 10
#> 2 2020 A X 1 NA
#> 3 2021 A X 1 NA
#> 4 2019 B Y 2 20
Created on 2021-12-08 by the reprex package (v2.0.1)
Or we could create a df2
and full_join
it with df
:
library(dplyr)
df2 <- tibble(Year = selected_years,
cat1 = "A",
cat2 = "X",
value1 = 1)
df %>%
full_join(df2, by = c("Year", "cat1", "cat2", "value1"))
#> # A tibble: 4 x 5
#> Year cat1 cat2 value1 value2
#> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 2019 A X 1 10
#> 2 2019 B Y 2 20
#> 3 2020 A X 1 NA
#> 4 2021 A X 1 NA
Created on 2021-12-08 by the reprex package (v2.0.1)
CodePudding user response:
library(tidyverse)
tibble(selected_years) %>%
mutate(cat1 = "A") %>%
full_join(df, by = "cat1") %>%
mutate(selected_years = ifelse(is.na(selected_years), Year, selected_years)) %>%
group_by(cat1) %>%
mutate(value2 = ifelse(row_number() != 1, NA, value2)) %>%
ungroup() %>%
select(Year = selected_years, cat = cat1, value1, value2)
Year cat value1 value2
<dbl> <chr> <dbl> <dbl>
1 2019 A 1 10
2 2020 A 1 NA
3 2021 A 1 NA
4 2019 B 2 20
CodePudding user response:
A solution based on dplyr::bind_rows
:
library(tidyverse)
df <- tibble(Year=c(2019),
cat1=c("A","B"),
cat2=c("X","Y"),
value1=c(1,2),
value2=c(10,20))
selected_years <- c(2020:2021)
df %>%
bind_rows(data.frame(
Year=selected_years, filter(., cat1 == "A") %>% select(-Year, -value2))) %>%
arrange(cat1)
#> # A tibble: 4 × 5
#> Year cat1 cat2 value1 value2
#> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 2019 A X 1 10
#> 2 2020 A X 1 NA
#> 3 2021 A X 1 NA
#> 4 2019 B Y 2 20