I have two datasets. The first is
issue_1_t1 <- c(10, 20, 30, 40)
issue_2_t1 <- c(10, 20, 30, 10)
issue_1_t2 <- c(10, 20, 30, 40)
issue_2_t2 <- c(10, 20, 30, 10)
issue_1_t3 <- c(10, 20, 30, 40)
issue_2_t3 <- c(10, 20, 30, 10)
area <- c(area1, area2, area3, area4)
area2 <- c(area10, area20, area30, area40)
df <- data.frame(issue_1_t1, issue_2_t1, issue_1_t2, issue_2_t2, issue_1_t3, issue_2_t3)
I want to reconfigure these, such that they form the following:
area area2 issue1 issue2
area1 area10 10 10
area2 area20 20 20
area3 area30 30 30
area4 area40 40 40
area1 area10 10 10
area2 area20 20 20
area3 area30 30 30
area4 area40 40 40
area1 area10 10 10
area2 area20 20 20
area3 area30 30 30
area4 area40 40 40
So far, I have only been able to divide the dataset into two datasets by time period and then stack them on top of each other. I wanted to know if there is a more efficient way that only requires one line of code.
CodePudding user response:
You may try
library(dplyr)
library(reshape2)
library(tidyr)
df %>%
melt %>%
mutate(time = str_sub(variable, -1),
issue = paste0("issue", str_split(variable, "_", simplify = T)[,2])) %>%
select(time, issue, value) %>%
group_by(issue) %>%
arrange(issue) %>%
mutate(n = 1, n = cumsum(n)) %>%
pivot_wider(values_from = value, names_from = issue) %>%
mutate(area = rep(area, max(n)/length(area)), area2 = rep(area2, max(n)/length(area2))) %>%
select(-time, -n)
issue1 issue2 area area2
<dbl> <dbl> <chr> <chr>
1 10 10 area1 area10
2 20 20 area2 area20
3 30 30 area3 area30
4 40 10 area4 area40
5 10 10 area1 area10
6 20 20 area2 area20
7 30 30 area3 area30
8 40 10 area4 area40
9 10 10 area1 area10
10 20 20 area2 area20
11 30 30 area3 area30
12 40 10 area4 area40
CodePudding user response:
I think the data shared is incomplete with syntax errors. I don't understand what you mean by two datasets (when there is only one df
) but I think what you have is something like this -
issue_1_t1 <- c(10, 20, 30, 40)
issue_2_t1 <- c(10, 20, 30, 10)
issue_1_t2 <- c(10, 20, 30, 40)
issue_2_t2 <- c(10, 20, 30, 10)
issue_1_t3 <- c(10, 20, 30, 40)
issue_2_t3 <- c(10, 20, 30, 10)
area <- c("area1", "area2", "area3", "area4")
area2 <- c("area10", "area20", "area30", "area40")
df <- data.frame(area, area2, issue_1_t1, issue_2_t1, issue_1_t2,
issue_2_t2, issue_1_t3, issue_2_t3)
df
# area area2 issue_1_t1 issue_2_t1 issue_1_t2 issue_2_t2 issue_1_t3 issue_2_t3
#1 area1 area10 10 10 10 10 10 10
#2 area2 area20 20 20 20 20 20 20
#3 area3 area30 30 30 30 30 30 30
#4 area4 area40 40 10 40 10 40 10
You can use pivot_longer
from tidyr
to get it in required shape.
tidyr::pivot_longer(df,
cols = starts_with('issue'),
names_to = '.value',
names_pattern = '(issue_\\d )')
# area area2 issue_1 issue_2
# <chr> <chr> <dbl> <dbl>
# 1 area1 area10 10 10
# 2 area1 area10 10 10
# 3 area1 area10 10 10
# 4 area2 area20 20 20
# 5 area2 area20 20 20
# 6 area2 area20 20 20
# 7 area3 area30 30 30
# 8 area3 area30 30 30
# 9 area3 area30 30 30
#10 area4 area40 40 10
#11 area4 area40 40 10
#12 area4 area40 40 10