I have two data frames, myDF
and index
, and I efficiently match between the two as shown below where I create a new column in data frame myDF
called alloc_1
which is based on row matches it finds in data frame Index
(where row number in data frame index
matches eleCnt
in data frame myDF
):
Preferably using my existing mutate()
as a starting point where I especially like index$index[eleCnt]
for identifying the index row number, how can I further restrict matching from data frame index
to only those rows where cumGrp
= 1? Resulting in the below pretend output:
eleCnt concat_1 alloc_1 explanation
1 1 1.0 10.0
2 2 2.0 NA since index row 2 cumGrp <> 1, don't match it
3 1 1.0 10.0
4 2 2.1 NA since index row 2 cumGrp <> 1, don't match it
5 3 2.2 NA
Below is code for myDF
, index
, and the dplyr mutate()
:
myDF <- data.frame(
eleCnt = c(1,2,1,2,3),
concat_1 = c(1,2,1,2.1,2.2)
)
index <- data.frame(
index = c(10,2.1),
cumGrp = c(1,2)
)
library(dplyr)
myDF %>% mutate(alloc_1 = index$index[eleCnt])
CodePudding user response:
myDF %>%
mutate(alloc_1 = index$index[index$cumGrp == 1][eleCnt])
# eleCnt concat_1 alloc_1
# 1 1 1.0 10
# 2 2 2.0 NA
# 3 1 1.0 10
# 4 2 2.1 NA
# 5 3 2.2 NA
This is close to a "join" operation, with just a little nudge (adding eleCnt
to index
). A join operation can be faster, perhaps easier to understand/debug (once you really "grok" join/merge, see How to join (merge) data frames (inner, outer, left, right), What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?), and allows for other methods/join efficiencies as well.
index$eleCnt <- seq_len(nrow(index)) # or whatever they should be
### simple join, not filtered on cumGrp
myDF %>%
left_join(index, by = "eleCnt")
# eleCnt concat_1 index cumGrp
# 1 1 1.0 10.0 1
# 2 2 2.0 2.1 2
# 3 1 1.0 10.0 1
# 4 2 2.1 2.1 2
# 5 3 2.2 NA NA
### improved
index %>%
filter(cumGrp == 1) %>%
left_join(myDF, ., by = "eleCnt")
# eleCnt concat_1 index cumGrp
# 1 1 1.0 10 1
# 2 2 2.0 NA NA
# 3 1 1.0 10 1
# 4 2 2.1 NA NA
# 5 3 2.2 NA NA
In the join/merge case, note that all other columns are brought in as well. This can be good (so that you don't have to repeat index$index[eleCnt]
for each column you want to import), and can easily be controlled by filter
ing or de-select
ing before/after the join.