Home > Net >  Duplicate Year columns to QuarterYear in R
Duplicate Year columns to QuarterYear in R

Time:12-06

I know how to duplicate columns in R but wonder if there is a more elegant way for a specific task that I'm doing.

I have multiple dataset. Some have value by Year, some have values by QuarterYear. Given that I need to do calculation to all the dataset by QuarterYear, I need to change dataset that have values by Year only to repeat the value by QuarterYear.

How my df look:

  Item  Year2020  Year2021
    A      1         4
    B      2         5
    C      3         6

What I'm trying to achieve

  Item   1Q20 2Q20 3Q20 4Q20 1Q21 2Q21 3Q21 4Q21 
    A      1    1    1    1   4    4    4    4
    B      2    2    2    2   5    5    5    5
    C      3    3    3    3   6    6    6    6

How I achieve it:

#recreate df
df  <- data.frame(Item = c("A", "B", "C"),
       Year2020 = c(1, 2, 3),
       Year2021 = c(4, 5, 6))

#replicate Year column 3 times and change names

df <- df %>%
 cbind(replicate(3, df$`Year2020`)) %>%
 rename(`1Q20` = `Year2020`,
        `2Q20` = `1`,
        `3Q20` = `2`,
        `4Q20` = `3`) %>%
 cbind(replicate(3, df$`Year2021`)) %>%
 rename(`1Q21` = `Year2021`,
        `2Q21` = `1`,
        `3Q21` = `2`,
        `4Q21` = `3`)

#rearrange col
df <- df[,c(1:2,4:6,3,7:9)]

Thankfully I only have two year columns but I have to do it to a lot of dataset and wonder if there is more neat way to this.

CodePudding user response:

Do not know what is elegant for you, I like a pivot_* workflow:

library(tidyverse)
df %>% 
   pivot_longer(-Item) %>% 
   slice(rep(1:n(), each = 4L)) %>% 
   group_by(Item, name) %>% 
   mutate(name = paste0(1:n(), "Q", str_extract(name, "[0-9]{2}$"))) %>% 
   pivot_wider()

# # A tibble: 3 × 9
# # Groups:   Item [3]
#   Item  `1Q20` `2Q20` `3Q20` `4Q20` `1Q21` `2Q21` `3Q21` `4Q21`
#   <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 A          1      1      1      1      4      4      4      4
# 2 B          2      2      2      2      5      5      5      5
# 3 C          3      3      3      3      6      6      6      6

CodePudding user response:

Here's a programmatic base R option:

idx <- which(grepl("Year", colnames(df)))
notidx <- setdiff(seq(ncol(df)), idx)
years <- gsub("Year20", "", colnames(df)[idx])
Q <- paste0(1:4, "Q", rep(years, each = 4))

df <- df[sort(c(notidx, rep(idx, each = 4)))]
colnames(df)[-grepl("Year", colnames(df))] <- Q

#   Item 1Q20 2Q20 3Q20 4Q20 1Q21 2Q21 3Q21 4Q21
# 1    A    1    1    1    1    4    4    4    4
# 2    B    2    2    2    2    5    5    5    5
# 3    C    3    3    3    3    6    6    6    6

CodePudding user response:

df %>%
  mutate(`1Q20` = Year2020,
         `2Q20` = Year2020,
         `3Q20` = Year2020,
         `4Q20` = Year2020,
         `1Q21` = Year2021,
         `2Q21` = Year2021,
         `3Q21` = Year2021,
         `4Q21` = Year2021)

CodePudding user response:

cbind(df$Item,rep(df %>% dplyr::select(contains("Year")),4) %>% 
  as.data.frame() %>% 
  rename(Q1_20=ends_with("20"),
         Q1_21=ends_with("21"),
         Q2_20=ends_with("20.1"),
         Q2_21=ends_with("21.1")))

  • Related