I have 4 data frames that all look like this:
Product 2018 | Number | Minimum | Maximum |
---|---|---|---|
1 | 56 | 1 | 5 |
2 | 42 | 12 | 16 |
3 | 6523 | 23 | 56 |
4 | 123 | 23 | 102 |
5 | 56 | 23 | 64 |
6 | 245623 | 56 | 87 |
7 | 546 | 25 | 540 |
8 | 54566 | 253 | 560 |
Product 2019 | Number | Minimum | Maximum |
---|---|---|---|
1 | 56 | 32 | 53 |
2 | 642 | 423 | 620 |
3 | 56423 | 432 | 560 |
4 | 3 | 431 | 802 |
5 | 2 | 2 | 6 |
6 | 4523 | 43 | 68 |
7 | 555 | 23 | 54 |
8 | 55646 | 3 | 6 |
Product 2020 | Number | Minimum | Maximum |
---|---|---|---|
1 | 23 | 2 | 5 |
2 | 342 | 4 | 16 |
3 | 223 | 3 | 5 |
4 | 13 | 4 | 12 |
5 | 2 | 4 | 7 |
6 | 223 | 7 | 8 |
7 | 5 | 34 | 50 |
8 | 46 | 3 | 6 |
Product 2021 | Number | Minimum | Maximum |
---|---|---|---|
1 | 234 | 3 | 5 |
2 | 3242 | 4 | 16 |
3 | 2423 | 43 | 56 |
4 | 123 | 43 | 102 |
5 | 24 | 4 | 6 |
6 | 2423 | 4 | 18 |
7 | 565 | 234 | 540 |
8 | 5646 | 23 | 56 |
I want to join all the tables so I get a table that looks like this:
Products | Number 2021 | Min-Max 2021 | Number 2020 | Min-Max 2020 | Number 2019 | Min-Max 2019 | Number 2018 | Min-Max 2018 |
---|---|---|---|---|---|---|---|---|
1 | 234 | 3 to 5 | 23 | 2 to 5 | ... | ... | ... | ... |
2 | 3242 | 4 to 16 | 342 | 4 to 16 | ... | ... | ... | ... |
3 | 2423 | 43 to 56 | 223 | 3 to 5 | ... | ... | ... | ... |
4 | 123 | 43 to 102 | 13 | 4 to 12 | ... | ... | ... | ... |
5 | 24 | 4 to 6 | 2 | 4 to 7 | ... | ... | ... | ... |
6 | 2423 | 4 to 18 | 223 | 7 to 8 | ... | ... | ... | ... |
7 | 565 | 234 to 540 | 5 | 34 to 50 | ... | ... | ... | ... |
8 | 5646 | 23 to 56 | 46 | 3 to 6 | ... | ... | ... | ... |
The Product for all years are the same so I would like to have a data frame that contains the number for each year as a column and joins the column for minimum and maximum as one.
Any help is welcome!
CodePudding user response:
How about something like this. You are trying to join several dataframes by a single column, which is relatively straight forward using full_join
. The difficulty is that you are trying to extract information from the column names and combine several columns at the same time. I would map
out everying you want to do and then reduce
the list of dataframes at the end. Here is an example with two dataframes, but you could add as many as you want to the list at the begining.
library(tidyverse)
#test data
set.seed(23)
df1 <- tibble("Product 2018" = seq(1:8),
Number = sample(1:100, 8),
Minimum = sample(1:100, 8),
Maximum = map_dbl(Minimum, ~sample(.x:1000, 1)))
set.seed(46)
df2 <- tibble("Product 2019" = seq(1:8),
Number = sample(1:100, 8),
Minimum = sample(1:100, 8),
Maximum = map_dbl(Minimum, ~sample(.x:1000, 1)))
list(df1, df2) |>
map(\(x){
year <- str_extract(colnames(x)[1], "\\d ?$")
mutate(x, !!quo_name(paste0("Min-Max ", year)) := paste(Minimum, "to", Maximum))|>
rename(!!quo_name(paste0("Number ", year)) := Number)|>
rename_with(~gsub("\\s\\d ?$", "", .), 1) |>
select(-c(Minimum, Maximum))
}) |>
reduce(full_join, by = "Product")
#> # A tibble: 8 x 5
#> Product `Number 2018` `Min-Max 2018` `Number 2019` `Min-Max 2019`
#> <int> <int> <chr> <int> <chr>
#> 1 1 29 21 to 481 50 93 to 416
#> 2 2 28 17 to 314 78 7 to 313
#> 3 3 72 40 to 787 1 91 to 205
#> 4 4 43 36 to 557 47 55 to 542
#> 5 5 45 70 to 926 52 76 to 830
#> 6 6 34 96 to 645 70 20 to 922
#> 7 7 48 31 to 197 84 6 to 716
#> 8 8 17 86 to 951 99 75 to 768
CodePudding user response:
This is a similar answer, but includes bind_rows
to combine the data.frames, then pivot_wider
to end in a wide format.
The first steps strip the year from the Product XXXX column name, as this carries relevant information on year for that data.frame. If that column is renamed as Product
they are easily combined (with a separate column containing the Year
). If this step can be taken earlier in the data collection or processing timeline, it is helpful.
library(tidyverse)
list(df1, df2, df3, df4) %>%
map(~.x %>%
mutate(Year = gsub("Product", "", names(.x)[1])) %>%
rename(Product = !!names(.[1]))) %>%
bind_rows() %>%
mutate(Min_Max = paste(Minimum, Maximum, sep = " to ")) %>%
pivot_wider(id_cols = Product, names_from = Year, values_from = c(Number, Min_Max), names_vary = "slowest")
Output
Product Number_2018 Min_Max_2018 Number_2019 Min_Max_2019 Number_2020 Min_Max_2020 Number_2021 Min_Max_2021
<int> <int> <chr> <int> <chr> <int> <chr> <int> <chr>
1 1 56 1 to 5 56 32 to 53 23 2 to 5 234 3 to 5
2 2 42 12 to 16 642 423 to 620 342 4 to 16 3242 4 to 16
3 3 6523 23 to 56 56423 432 to 560 223 3 to 5 2423 43 to 56
4 4 123 23 to 102 3 431 to 802 13 4 to 12 123 43 to 102
5 5 56 23 to 64 2 2 to 6 2 4 to 7 24 4 to 6
6 6 245623 56 to 87 4523 43 to 68 223 7 to 8 2423 4 to 18
7 7 546 25 to 540 555 23 to 54 5 34 to 50 565 234 to 540
8 8 54566 253 to 560 55646 3 to 6 46 3 to 6 5646 23 to 56