I have a complicated problem related to the sorting of the data.
Introduction to the issue. Below are two tables that include data in annual granularity for the same units.
The first table refers to the load per year for each unit.
Table 1. Load per year
Year | Unit A | Unit B | Unit C | Unit D |
---|---|---|---|---|
2023 | 5000 | 4000 | 3000 | 2000 |
2024 | 5000 | 4000 | 3000 | 2000 |
2025 | 5000 | 4000 | 3000 | 0 |
2026 | 0 | 4000 | 3000 | 0 |
2027 | 0 | 0 | 3000 | 0 |
The second table refers to the marginal price per year for each unit.
Table 2. Marginal cost per year
Year | Unit A | Unit B | Unit C | Unit D |
---|---|---|---|---|
2023 | 120 | 240 | 170 | 150 |
2024 | 140 | 250 | 180 | 175 |
2025 | 160 | 270 | 190 | 0 |
2026 | 0 | 300 | 200 | 0 |
2027 | 0 | 0 | 215 | 0 |
I would like the units in the tables above to be sorted in such a way that the matching starts from the lowest marginal cost (table 2) and continues until the total load (table 1) reaches the assumed level in table 3 (column number 3 in table 3). Below I have presented the data in how I would like the solution to look after sorting.
The expected solution: Table 3. Sorted units
Year | Hour | Required load | Missing load (Required load - sum units load) | ||||
---|---|---|---|---|---|---|---|
2023 | 3 | 10000 | Unit A | Unit D | Unit C | 0 | |
2023 | 4 | 13500 | Unit A | Unit D | Unit C | Unit B | 1000 |
... | ... | ... | ... | ... | ... | ... | ... |
2025 | 5 | 8000 | Unit A | Unit C | 0 | ||
2025 | 6 | 18000 | Unit A | Unit C | Unit B | 6000 |
Even better and more accurate would be to present a marginal cost instead of unit names, because I am interested in which unit and at what marginal cost closes the pile (which is the most expensive unit) like:
Year | Hour | Required load | Missing load (Required load - sum units load) | ||||
---|---|---|---|---|---|---|---|
2023 | 3 | 10000 | 120 | 150 | 170 | 0 | |
... | ... | ... | ... | ... | ... | ... | ... |
2025 | 5 | 8000 | 160 | 190 | 0 |
I tried different ways but unfortunately I did not find the right way to sort this way.
Many thanks for any suggestions and solutions.
CodePudding user response:
library(tidyverse)
Tab1 <- tribble(
~Year, ~UnitA, ~UnitB, ~UnitC, ~UnitD,
2023, 5000, 4000, 3000, 2000,
2024, 5000, 4000, 3000, 2000,
2025, 5000, 4000, 3000, 0,
2026, 0, 4000, 3000, 0,
2027, 0, 0, 3000, 0)
Tab2 <- tribble(
~Year, ~UnitA, ~UnitB, ~UnitC, ~UnitD,
2023, 120, 240, 170, 150,
2024, 140, 250, 180, 175,
2025, 160, 270, 190, 0,
2026, 0, 300, 200, 0,
2027, 0, 0, 215, 0)
Tab3 <- tribble(
~Year, ~Hour, ~Req_load,
2023, 3, 10000,
2023, 4, 13500,
2025, 5, 8000,
2025, 6, 18000,
)
consolidatedTab <- Tab1 %>%
pivot_longer(cols = -c(Year), names_to = "Unit", names_prefix = "Unit", values_to = "load") %>%
left_join(Tab2 %>%
pivot_longer(cols = -c(Year), names_to = "Unit", names_prefix = "Unit", values_to = "marg")
) %>%
group_by(Year) %>%
arrange(marg, .by_group = TRUE) %>%
mutate(totLoad = cumsum(load))
SummaryTab <- Tab3 %>%
rowwise() %>%
mutate(missingLoad = ifelse(Req_load - max(consolidatedTab[consolidatedTab$Year == Year,]$totLoad) <= 0,
0,
Req_load - max(consolidatedTab[consolidatedTab$Year == Year,]$totLoad)
),
finalUnit = ifelse(missingLoad >0, "N/A", first(consolidatedTab[consolidatedTab$Year == Year & consolidatedTab$totLoad > Req_load,]$Unit)),
finalunitprice = ifelse(missingLoad >0, 0, first(consolidatedTab[consolidatedTab$Year == Year & consolidatedTab$totLoad > Req_load,]$marg))
)
My best attempt, apologies for lack of explanation but I have to run!
gives:
>head(consolidatedTab)
# A tibble: 6 × 5
# Groups: Year [2]
Year Unit load marg totLoad
<dbl> <chr> <dbl> <dbl> <dbl>
1 2023 A 5000 120 5000
2 2023 D 2000 150 7000
3 2023 C 3000 170 10000
4 2023 B 4000 240 14000
5 2024 A 5000 140 5000
6 2024 D 2000 175 7000
>
and
>
# A tibble: 4 × 6
# Rowwise:
Year Hour Req_load missingLoad finalUnit finalunitprice
<dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 2023 3 10000 0 B 240
2 2023 4 13500 0 B 240
3 2025 5 8000 0 B 270
4 2025 6 18000 6000 N/A 0