Home > Mobile >  Replicate a row (specific columns) based on a vector
Replicate a row (specific columns) based on a vector

Time:12-08

Is there a straightforward way in dplyr to expand a dataframe by replicating a row based on a specific vector?

For example, I have following dataframe:

df <- tibble(Year=c(2019),
       cat1=c("A","B"),
       cat2=c("X","Y"),
       value1=c(1,2),
       value2=c(10,20))

selected_years <- c(2019:2021)

where I would like to replicate the row where cat1=="A" for the years 2019-2021. The values of some columns (value1, cat) should be taken from the original year 2019, some other columns (value2) filled with NAs.

The final output should look like:

Year cat   value1 value2
2019 A          1     10
2020 A          1     NA
2021 A          1     NA
2019 B          2     20

I tried with bind_rows()...however, the result is not fully what I wanted (I only get the "A"-Part not the "B"-Part), and I am not sure if this is really the most intuitive/dplyr way to go, or if another approach (or even specific function) would be more reasonable:

df%>%
  filter(cat1=="A",Year==2019)%>%
  bind_rows(
    data.frame(
    Year=setdiff(selected_years,.$Year),
    cat1=.$cat1,
    value1=.$value1
    )
  )
)

Edit: I also tried using expand and right_join, but I then my desired column values are not repeated:

df %>% 
  dplyr::right_join(df %>%
                      filter(cat1=="A",Year==2019)%>%
                      expand(Year=c(2019:2021)))

Maybe an approach involving case_when?

CodePudding user response:

The part where you only want to keep specfic values and others not, makes this tricky. It is easy to expand the vector on all values using rowwise and unnest together with the condition in if_else. In the last step we just reset the values to NA which we don't want to replicate. If you have more than one value that you want to set NA, we can use across.

library(tidyverse)

df <- tibble(Year=c(2019),
             cat1=c("A","B"),
             cat2=c("X","Y"),
             value1=c(1,2),
             value2=c(10,20))

selected_years <- c(2019:2021)


df %>% 
  rowwise %>% 
  mutate(Year = if_else(cat1 == "A", list(selected_years), list(Year))) %>% 
  unnest(Year) %>% 
  mutate(value2 = if_else(Year != 2019, NA_real_, value2))

#> # A tibble: 4 x 5
#>    Year cat1  cat2  value1 value2
#>   <dbl> <chr> <chr>  <dbl>  <dbl>
#> 1  2019 A     X          1     10
#> 2  2020 A     X          1     NA
#> 3  2021 A     X          1     NA
#> 4  2019 B     Y          2     20

Created on 2021-12-08 by the reprex package (v2.0.1)


Or we could create a df2 and full_join it with df:

library(dplyr)

df2 <- tibble(Year = selected_years,
             cat1 = "A",
             cat2 = "X",
             value1 = 1)

df %>% 
  full_join(df2, by = c("Year", "cat1", "cat2", "value1"))

#> # A tibble: 4 x 5
#>    Year cat1  cat2  value1 value2
#>   <dbl> <chr> <chr>  <dbl>  <dbl>
#> 1  2019 A     X          1     10
#> 2  2019 B     Y          2     20
#> 3  2020 A     X          1     NA
#> 4  2021 A     X          1     NA

Created on 2021-12-08 by the reprex package (v2.0.1)

CodePudding user response:

library(tidyverse)
tibble(selected_years) %>% 
  mutate(cat1 = "A") %>% 
  full_join(df, by = "cat1") %>% 
  mutate(selected_years = ifelse(is.na(selected_years), Year, selected_years)) %>% 
  group_by(cat1) %>% 
  mutate(value2 = ifelse(row_number() != 1, NA, value2)) %>%
  ungroup() %>% 
  select(Year = selected_years, cat = cat1, value1, value2)
   Year cat   value1 value2
  <dbl> <chr>  <dbl>  <dbl>
1  2019 A          1     10
2  2020 A          1     NA
3  2021 A          1     NA
4  2019 B          2     20

CodePudding user response:

A solution based on dplyr::bind_rows:

library(tidyverse)

df <- tibble(Year=c(2019),
             cat1=c("A","B"),
             cat2=c("X","Y"),
             value1=c(1,2),
             value2=c(10,20))

selected_years <- c(2020:2021)

df %>% 
  bind_rows(data.frame(
    Year=selected_years, filter(., cat1 == "A") %>% select(-Year, -value2))) %>% 
    arrange(cat1)

#> # A tibble: 4 × 5
#>    Year cat1  cat2  value1 value2
#>   <dbl> <chr> <chr>  <dbl>  <dbl>
#> 1  2019 A     X          1     10
#> 2  2020 A     X          1     NA
#> 3  2021 A     X          1     NA
#> 4  2019 B     Y          2     20
  • Related