I have a table that contains letters and numbers:
xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
number = c (1, 2, 3, 1, 2, 3, 4, 5, 6))
I would like to first group data by "letter" and then check if there are any two groups that have identical values in number column. These would be groups with letters "a" and "b" in the "letter" column.
The result would look like this
xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
number = c (1, 2, 3, 1, 2, 3, 4, 5, 6),
duplicated = c (rep (TRUE, 6), rep (FALSE, 3)) )
is there a way to do this elegantly in dplyr?
CodePudding user response:
You can use cur_group()
to subset xx
and test for containment:
library(dplyr)
xx %>%
group_by(letter) %>%
mutate(duplicated = number %in% xx$number[!xx$letter %in% cur_group()]) %>%
ungroup()
cur_group()
represents the value of the current grouping variable (ega
,b
,c
).- We use the value of
cur_group()
to subset the vectorxx$number
for those that are not part of the current group (!xx$letter %in% cur_group()
). - Lastly, we check if the current group
number
is in that subset (number %in% ...
).
Output
letter number duplicated
<chr> <dbl> <lgl>
1 a 1 TRUE
2 a 2 TRUE
3 a 3 TRUE
4 b 1 TRUE
5 b 2 TRUE
6 b 3 TRUE
7 c 4 FALSE
8 c 5 FALSE
9 c 6 FALSE
CodePudding user response:
You may try
xx %>%
distinct() %>%
group_by(number) %>%
mutate(n = n()) %>%
mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
select(-n)
letter number duplicated
<chr> <dbl> <lgl>
1 a 1 TRUE
2 a 2 TRUE
3 a 3 TRUE
4 b 1 TRUE
5 b 2 TRUE
6 b 3 TRUE
7 c 4 FALSE
8 c 5 FALSE
9 c 6 FALSE
distinct
is for if there are some duplicated inside of gruoup,
xx <- tibble (letter = c (rep ("a", 4), rep ("b", 3), rep ("c", 3)),
number = c (1,1, 2, 3, 1, 2, 3, 4, 4, 6))
letter number
<chr> <dbl>
1 a 1
2 a 1
3 a 2
4 a 3
5 b 1
6 b 2
7 b 3
8 c 4
9 c 4
10 c 6
xxx <- xx %>%
distinct() %>%
group_by(number) %>%
mutate(n = n()) %>%
mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
select(-n)
xx %>%
full_join(xxx, by = c("letter", "number"))
letter number duplicated
<chr> <dbl> <lgl>
1 a 1 TRUE
2 a 1 TRUE
3 a 2 TRUE
4 a 3 TRUE
5 b 1 TRUE
6 b 2 TRUE
7 b 3 TRUE
8 c 4 FALSE
9 c 4 FALSE
10 c 6 FALSE
Thanks to @LMc
xxx <- xx %>%
distinct() %>%
add_count(number) %>%
mutate(duplicated = n> 1) %>%
select(-n)
xxx
xx %>%
full_join(xxx, by = c("letter", "number"))