I have the following dataframe (dput is provided at the bottom of the question):
>df_input
# A tibble: 5 x 4
category range samples events
<chr> <chr> <dbl> <dbl>
1 GroupA Apr2002 4951 97796
2 GroupA May2002 9332 195726
3 GroupB Apr2001 4781 80767
4 GroupB Oct2001 5677 92890
5 GroupB OctToNov2001 10296 166037
I would like to create a new dataframe with rows that are a combination of each unique combination of both the category
and range
columns. For example, category = GroupA
and range = Apr2002
would have 3 rows in the output dataframe for each of the three category = Group B
rows.
The range
column in the input dataframe will always have unique values only.
I would also like to rename the combined output columns for events
, samples
and range
to include the Group
names (i.e. range_GroupA
, range_GroupB
, samples_GroupA
, events_GroupA
, samples_GroupB
, events_GroupB
)
I'm struggling with how to create my combined rows from the category
column. I'm also struggling to find the right search terms here to find similar questions/answers. The closest I've managed to find so far is Create new rows in data frame based on multiple values of column, but the combo in that question is a bit different that what I'm attempting.
The desired output dataframe is:
> df_output
# A tibble: 6 x 6
range_GroupA range_GroupB samples_GroupA events_GroupA samples_GroupB events_GroupB
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Apr2002 Apr2001 4951 97796 4781 80767
2 Apr2002 Oct2001 4951 97796 5677 92890
3 Apr2002 OctToNov2001 4951 97796 10296 166037
4 May2002 Apr2001 9332 195726 4781 80767
5 May2022 Oct2001 9332 195726 5677 92890
6 May2022 OctToNov2001 9332 195726 10296 166037
df_input dataframe:
df_input <- structure(list(category = c("GroupA", "GroupA", "GroupB", "GroupB",
"GroupB"), range = c("Apr2002", "May2002", "Apr2001", "Oct2001",
"OctToNov2001"), samples = c(4951, 9332, 4781, 5677, 10296),
events = c(97796, 195726, 80767, 92890, 166037)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
df_output dataframe
df_output <- structure(list(range_GroupA = c("Apr2002", "Apr2002", "Apr2002",
"May2002", "May2022", "May2022"), range_GroupB = c("Apr2001",
"Oct2001", "OctToNov2001", "Apr2001", "Oct2001", "OctToNov2001"
), samples_GroupA = c(4951, 4951, 4951, 9332, 9332, 9332), events_GroupA = c(97796,
97796, 97796, 195726, 195726, 195726), samples_GroupB = c(4781,
5677, 10296, 4781, 5677, 10296), events_GroupB = c(80767, 92890,
166037, 80767, 92890, 166037)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
I think we can get your result with a filtered cartesian join:
library(dplyr)
left_join(
df_input %>% mutate(dummy = 1),
df_input %>% mutate(dummy = 1), by = "dummy") %>%
filter(category.x < category.y)
You'll recognize all the numbers you're looking for, but with different header names. We can rename them manually, but that's no fun. See below for renamed version.
# A tibble: 6 × 9
category.x range.x samples.x events.x dummy category.y range.y samples.y events.y
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
1 GroupA Apr2002 4951 97796 1 GroupB Apr2001 4781 80767
2 GroupA Apr2002 4951 97796 1 GroupB Oct2001 5677 92890
3 GroupA Apr2002 4951 97796 1 GroupB OctToNov2001 10296 166037
4 GroupA May2002 9332 195726 1 GroupB Apr2001 4781 80767
5 GroupA May2002 9332 195726 1 GroupB Oct2001 5677 92890
6 GroupA May2002 9332 195726 1 GroupB OctToNov2001 10296 166037
EDIT: This seems to do it with the renaming:
left_join(
df_input %>% rename_with(~paste0(.,"_GroupA")) %>% mutate(dummy = 1),
df_input %>% rename_with(~paste0(.,"_GroupB")) %>% mutate(dummy = 1),
by = "dummy") %>%
filter(category_GroupA < category_GroupB) %>%
select(-category_GroupA, -dummy, -category_GroupB) %>%
relocate(range_GroupB, .after = 1)
# A tibble: 6 × 6
range_GroupA range_GroupB samples_GroupA events_GroupA samples_GroupB events_GroupB
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Apr2002 Apr2001 4951 97796 4781 80767
2 Apr2002 Oct2001 4951 97796 5677 92890
3 Apr2002 OctToNov2001 4951 97796 10296 166037
4 May2002 Apr2001 9332 195726 4781 80767
5 May2002 Oct2001 9332 195726 5677 92890
6 May2002 OctToNov2001 9332 195726 10296 166037