Home > Back-end >  pivot_longer() and merge two datasets
pivot_longer() and merge two datasets

Time:11-08

I have two datasets. The first is

issue_1_t1 <- c(10, 20, 30, 40)
issue_2_t1 <- c(10, 20, 30, 10)
issue_1_t2 <- c(10, 20, 30, 40)
issue_2_t2 <- c(10, 20, 30, 10)
issue_1_t3 <- c(10, 20, 30, 40)
issue_2_t3 <- c(10, 20, 30, 10)
area <- c(area1, area2, area3, area4)
area2 <- c(area10, area20, area30, area40)
df <- data.frame(issue_1_t1, issue_2_t1, issue_1_t2, issue_2_t2, issue_1_t3, issue_2_t3)

I want to reconfigure these, such that they form the following:

area    area2   issue1   issue2
area1   area10  10       10
area2   area20  20       20
area3   area30  30       30
area4   area40  40       40
area1   area10  10       10
area2   area20  20       20
area3   area30  30       30
area4   area40  40       40
area1   area10  10       10
area2   area20  20       20
area3   area30  30       30
area4   area40  40       40

So far, I have only been able to divide the dataset into two datasets by time period and then stack them on top of each other. I wanted to know if there is a more efficient way that only requires one line of code.

CodePudding user response:

You may try

library(dplyr)
library(reshape2)
library(tidyr)

df %>%
  melt %>%
  mutate(time = str_sub(variable, -1),
         issue = paste0("issue", str_split(variable, "_", simplify = T)[,2])) %>%
  select(time, issue, value) %>%
  group_by(issue) %>%
  arrange(issue) %>%
  mutate(n = 1, n = cumsum(n)) %>%
  pivot_wider(values_from = value, names_from = issue) %>%
  mutate(area = rep(area, max(n)/length(area)), area2 = rep(area2, max(n)/length(area2))) %>%
  select(-time, -n)

   issue1 issue2 area  area2 
    <dbl>  <dbl> <chr> <chr> 
 1     10     10 area1 area10
 2     20     20 area2 area20
 3     30     30 area3 area30
 4     40     10 area4 area40
 5     10     10 area1 area10
 6     20     20 area2 area20
 7     30     30 area3 area30
 8     40     10 area4 area40
 9     10     10 area1 area10
10     20     20 area2 area20
11     30     30 area3 area30
12     40     10 area4 area40

CodePudding user response:

I think the data shared is incomplete with syntax errors. I don't understand what you mean by two datasets (when there is only one df) but I think what you have is something like this -

issue_1_t1 <- c(10, 20, 30, 40)
issue_2_t1 <- c(10, 20, 30, 10)
issue_1_t2 <- c(10, 20, 30, 40)
issue_2_t2 <- c(10, 20, 30, 10)
issue_1_t3 <- c(10, 20, 30, 40)
issue_2_t3 <- c(10, 20, 30, 10)
area <- c("area1", "area2", "area3", "area4")
area2 <- c("area10", "area20", "area30", "area40")
df <- data.frame(area, area2, issue_1_t1, issue_2_t1, issue_1_t2, 
                 issue_2_t2, issue_1_t3, issue_2_t3)
df

#   area  area2 issue_1_t1 issue_2_t1 issue_1_t2 issue_2_t2 issue_1_t3 issue_2_t3
#1 area1 area10         10         10         10         10         10         10
#2 area2 area20         20         20         20         20         20         20
#3 area3 area30         30         30         30         30         30         30
#4 area4 area40         40         10         40         10         40         10

You can use pivot_longer from tidyr to get it in required shape.

tidyr::pivot_longer(df, 
                    cols = starts_with('issue'), 
                    names_to = '.value', 
                    names_pattern = '(issue_\\d )')

#    area  area2  issue_1 issue_2
#   <chr> <chr>    <dbl>   <dbl>
# 1 area1 area10      10      10
# 2 area1 area10      10      10
# 3 area1 area10      10      10
# 4 area2 area20      20      20
# 5 area2 area20      20      20
# 6 area2 area20      20      20
# 7 area3 area30      30      30
# 8 area3 area30      30      30
# 9 area3 area30      30      30
#10 area4 area40      40      10
#11 area4 area40      40      10
#12 area4 area40      40      10
  • Related