In the below reproducible code, the user inputs a series of Element
and Group
codes into vectors. A Group number not = 0 means that Element is to be split into a sub-group and is "paired with" the Elements next to it sharing the same Group code. ElementCnt
means the sequential number of times that Element appears in the vector.
I am trying to create a dplyr::mutate(...)
statement for GroupRank
in the reproducible code that returns the following GroupRank for each row: if Group = 0, then return that row's ElementCnt; if it is the first Element in a sub-group pair (again, sub-group flagged by Group not = 0), then take its ElementCnt - the number of pairings that occurred with a lower ElementCnt than it, as explained below under "Explain GroupRank"; and if it shares the same Group number as the Element immediately above it, then it shares the same GroupRank:
Element Group ElementCnt GroupRank Explain GroupRank
1 A 0 1 1 GroupRank = ElementCnt since Group = 0
2 A 0 2 2 GroupRank = ElementCnt since Group = 0
3 B 0 1 1 GroupRank = ElementCnt since Group = 0
4 B 11 2 2 Since GroupRank > 0 and this is the first Element in Group 11, assign GroupRank = ElementCnt; no need to reduce ElementCnt since there are no pairings that occur with ElementCnt < 2
5 B 11 3 2 Since Group = Group immediately above, assign same GroupRank
6 B 0 4 4 GroupRank = ElementCnt since Group = 0
7 B 0 5 5 GroupRank = ElementCnt since Group = 0
8 B 0 6 6 GroupRank = ElementCnt since Group = 0
9 C 0 1 1 GroupRank = ElementCnt since Group = 0
10 A 0 3 3 GroupRank = ElementCnt since Group = 0
11 A 2 4 3 GroupRank should be 3 because Group 11 sub-group pair above causes all ElementCnt > 2 to decrease by 1; so ElementCnt of 4 - 1 = 3
12 A 2 5 3 Since Group = Group immediately above, assign same GroupRank
13 B -3 7 5 GroupRank should be 5 because Group 11 sub-group pair above causes all ElementCnt > 2 to decrease by 1 and Group 2 sub-group pair above causes ElementCnt > 4 to decrease by 1 again; ElementCnt 7 - 1 - 1 = 5
14 B -3 8 5 Since Group = Group immediately above, assign same GroupRank
15 B -3 9 5 Since Group = Group immediately above, assign same GroupRank
Reproducible code (GroupRank is not working correctly):
library(dplyr)
myData <-
data.frame(
Element = c("A","A","B","B","B","B","B","B","C","A","A","A","B","B","B"),
Group = c(0,0,0,11,11,0,0,0,0,0,2,2,-3,-3,-3)
)
excelCopy <- myData %>%
group_by(Element) %>%
mutate(ElementCnt = row_number()) %>%
mutate(GroupRank = case_when(Group > 0 ~ match(Group, unique(Group)),TRUE ~ ElementCnt)) %>%
ungroup()
print.data.frame(excelCopy)
Code reflecting Marcus solution - with one modification to generalize to my 12 test scenarios (all modifications from Marcus are commented #):
excelCopy <- myData %>%
mutate(origOrder = row_number()) %>% # change to origOrder, clearer for me
group_by(Element) %>%
mutate(
ElementCnt = row_number()
) %>%
ungroup() %>%
mutate(
Group = factor(Group, unique(Group))
) %>%
arrange(Group) %>%
mutate(
groupCt = cumsum(Group != lag(Group, 1, Group[[1]])) - 1L
) %>%
group_by(Group) %>%
mutate(
GroupRank = ElementCnt - groupCt,
GroupRank = max(1,if_else( as.character(Group) == "0", ElementCnt, min(GroupRank))) # added a floor value of 1 to generalize to all my scenarios
) %>%
ungroup() %>%
arrange(origOrder) %>%
select(-groupCt)
CodePudding user response:
I'm not sure how well this generalizes, but it should match the desired output.
Since, Group is independent of Element you need to change the grouping depending on which statistic you are trying to add. ElementCt
should be calculated with group_by(Element)
. GroupRank
the data should be group_by(Group)
. Lastly, because Group
is not actually a count (skips from 11 to 2 to -3) but the ordering matters, it should be treated as factor
.
excelCopy <- myData %>%
# add to be able to verify against desired output
mutate(ogOrder = row_number()) %>%
group_by(Element) %>%
mutate(
ElementCnt = row_number()
) %>%
# done with the "Element" statistics
ungroup() %>%
# Ordering matters, but can't rely on the Group value for ordering
mutate(
Group = factor(Group, unique(Group))
) %>%
arrange(Group) %>%
# need to know cumulative sub-group count to use as an offset
# subtract 1 to 0 offset for first sub-group
mutate(
groupCt = cumsum(Group != lag(Group, 1, Group[[1]])) - 1L
) %>%
# perform "Group" statistics
group_by(Group) %>%
mutate(
# set the first row in each sub group to the correct GroupRank
GroupRank = ElementCnt - groupCt,
# Correct the non-subgroup entries and non-initial sub-group rows
GroupRank = if_else( as.character(Group) == "0", ElementCnt, min(GroupRank))
) %>%
ungroup() %>%
# add to be able to verify against desired output
arrange(ogOrder) %>%
select(-groupCt)