Sample Dataframe:
ID RNG_BEFORE_WDL RNG_AFTER_WDL
<int> <chr> <chr>
1 3490115 Less than 10K Less than 10K
2 3564671 Less than 1K 0
3 3914214 Less than 30K 0
4 3971472 More than 60K More than 60K
5 3971472 More than 60K More than 60K
6 4138130 Less than 1K 0
7 4143893 Less than 10K Less than 10K
8 4145280 Less than 10K Less than 100
9 4146908 Less than 30K Less than 30K
10 4146929 Less than 10K Less than 1K
11 4146958 Less than 30K Less than 10K
12 4147813 Less than 10K Less than 1K
13 4148128 Less than 1K 0
14 4148446 Less than 60K Less than 60K
15 4148446 Less than 60K Less than 60K
2 columns (RNG_BEFORE_WDL & RNG_AFTER_WDL) has different name but share unique value:
[1] "Less than 10K" "Less than 1K" "Less than 30K" "More than 60K"
[5] "Less than 60K" "Less than 100"
Expected outcome:
SV_RNG BF_WDL AF_WDL
<chr> <int> <int>
1 0 NA 926339
2 Less than 100 15165 78106
3 Less than 1K 289663 588669
4 Less than 10K 1225950 558488
5 Less than 30K 848153 469356
6 Less than 60K 464382 354119
7 More than 60K 4531080 4399316
Two Step Solution:
# CREATE SEPARATE ENTITY_SET
es_01 <- df |> distinct(ID, .keep_all = TRUE) |> group_by(RNG_BEFORE_WDL)|> summarise(BF_WDL = n()) |> rename(SV_RNG = RNG_BEFORE_WDL)
es_02 <- df |> distinct(ID, .keep_all = TRUE) |> group_by(RNG_AFTER_WDL) |> summarise(AF_WDL = n()) |> rename(SV_RNG = RNG_AFTER_WDL)
# PRE-DEFINE CUSTOM SORT ORDER
ls_odr <- list("0", "Less than 100", "Less than 1K", "Less than 10K", "Less than 30K", "Less than 60K", "More than 60K")
# SUMMARIZE ENTITY_SET
s_es00 <- es_01 |> full_join(es_02) |> arrange(`SV_RNG` |> factor(ls_odr))
How to do in a single step with R dplyr?
Thanks!
CodePudding user response:
If we need two datasets or keep it in a list, loop over the grouping column name as a string, and the the summarisation column
library(dplyr)
library(purrr)
lst1 <- map2(c("RNG_BEFORE_WDL", "RNG_AFTER_WD"), c("BF_WDL", "AF_WDL"),
~ df %>%
distinct(ID, .keep_all = TRUE) %>%
group_by(across(all_of(.x))) %>%
summarise(!! .y := n(), .groups = 'drop') %>%
rename(SV_RNG = 1)) %>%
reduce(full_join)