Duplicate rows in R with a column that must vary when its rows are duplicated-CodePudding

I have a dataframe in R of this form currently

city  product   inv    pla 
city1 product1  0      2
city1 product2  1      2
city1 product3  2      2
city1 product4  0      1
city1 product5  1      1
city1 product6  1      1

I need to duplicate the rows when pla = 2, but put a value in the inv column according to the following:

When inv = 0, both rows must be equal to 0. When inv = 1, the first row must be equal to 1, and the second row must be equal to 0. When inv=2, both rows must equal 1.

How do I create a dataframe like this?:

city  product   inv    pla 
city1 product1  0      2
city1 product1  0      2
city1 product2  1      2
city1 product2  0      2
city1 product3  1      2
city1 product3  1      2
city1 product4  0      1
city1 product5  1      1
city1 product6  1      1

CodePudding user response：

Here I pull out the rows that need duplicating, modify them as necessary, and stick them back on the original data:

df %>%
  filter(pla == 2) %>%
  mutate(inv = case_when(inv == 1 ~ 0L, TRUE ~ inv)) %>%
  bind_rows(df) %>% 
  mutate(inv = case_when(inv == 2 & pla == 2 ~ 1L, TRUE ~ inv)) %>%
  arrange(city, product, inv)
#   city  product inv pla
# 1 city1 product1   0   2
# 2 city1 product1   0   2
# 3 city1 product2   0   2
# 4 city1 product2   1   2
# 5 city1 product3   1   2
# 6 city1 product3   1   2
# 7 city1 product4   0   1
# 8 city1 product5   1   1
# 9 city1 product6   1   1

CodePudding user response：

First create your "first and second row" values, putting them in inv and inv2, respectively; then pivot them into new rows using tidyr::pivot_longer().

library(dplyr)
library(tidyr)

mydata %>% 
  mutate(
    inv2 = case_when(
      pla == 1 ~ NA_real_,
      inv < 2 ~ 0,
      inv == 2 ~ 1
    ),
    inv = if_else(inv == 2, 1, inv)
  ) %>% 
  pivot_longer(
    c(inv, inv2), 
    names_to = NULL, 
    values_to = "inv", 
    values_drop_na = TRUE
  )

#> # A tibble: 9 x 4
#>   city  product    pla   inv
#>   <chr> <chr>    <dbl> <dbl>
#> 1 city1 product1     2     0
#> 2 city1 product1     2     0
#> 3 city1 product2     2     1
#> 4 city1 product2     2     0
#> 5 city1 product3     2     1
#> 6 city1 product3     2     1
#> 7 city1 product4     1     0
#> 8 city1 product5     1     1
#> 9 city1 product6     1     1

^{Created on 2022-03-16 by the reprex package (v2.0.1)}

CodePudding user response：

base R solution

data <- read.table(text = "city  product   inv    pla 
city1 product1  0      2
city1 product2  1      2
city1 product3  2      2
city1 product4  0      1
city1 product5  1      1
city1 product6  1      1", header = TRUE)

data_final <- data[rep(seq_len(nrow(data)), data$pla), ]
which_rep <- data_final$pla == 2
data_final$inv[which_rep] <- data_final$inv[which_rep] / 2
data_final$inv[which_rep & data_final$inv == 0.5] <- c(1, 0)
rownames(data_final) <- NULL
data_final

CodePudding user response：

Create a function f to make the vector of values you need for each group, and then use tidyr::separate_rows.

library(tidyverse)

f <- function(x, n){
  sapply(x, function(y) paste(c(rep(1, y), rep(0, length.out = n - y)), collapse = " "))
}

dat %>% 
  mutate(inv = ifelse(pla == 2, f(inv, 2), inv)) %>% 
  separate_rows(inv)

output

# A tibble: 9 x 4
  city  product  inv     pla
  <chr> <chr>    <chr> <int>
1 city1 product1 0         2
2 city1 product1 0         2
3 city1 product2 1         2
4 city1 product2 0         2
5 city1 product3 1         2
6 city1 product3 1         2
7 city1 product4 0         1
8 city1 product5 1         1
9 city1 product6 1         1