I am new on this site, I have tried to find the answer in other threads but haven't been lucky yet.
I have a dataframe in R:
Name | ID |
---|---|
Tom | 1 |
Tom | 1 |
Tina | 1 |
Tom | 2 |
Tina | 3 |
Tina | 3 |
What I would like to achieve, is to change the value of "Name" only if they share the same "ID", AND if both names appear within that group. If there is only one "Name" appearing within and "ID" group, "Name" should remain.
The result should thus look something like this:
Name | ID |
---|---|
Both | 1 |
Both | 1 |
Both | 1 |
Tom | 2 |
Tina | 3 |
Tina | 3 |
I guess there this should be feasible with dplyr? I have wrapped my head around it but unfortunately couldn't find a way.
Many thanks in advance!
CodePudding user response:
df %>%
group_by(ID) %>%
mutate(Name = ifelse(n_distinct(Name) >= 2, "Both", Name))
(Where df
is:
df <- read.table(text =
'Name ID
Tom 1
Tom 1
Tina 1
Tom 2
Tina 3
Tina 3', header = TRUE)
CodePudding user response:
This is pretty straightforward. You just need to break it down into small steps. Here's a tidyverse solution.
library(tidyverse)
d <- tibble(
Name=c("Tom", "Tom", "Tina", "Tom", "Tina", "Tina"),
ID=c(1, 1, 1, 2, 3, 3)
)
First, work out how many names appear for each ID.
d %>%
group_by(ID) %>%
summarize(NameCount=length(unique(Name)))
# A tibble: 3 x 2
ID NameCount
<dbl> <int>
1 1 2
2 2 1
3 3 1
Add this information to the source tibble
d %>%
left_join(
d %>%
group_by(ID) %>%
summarize(NameCount=length(unique(Name))),
by="ID"
)
# A tibble: 6 x 3
Name ID NameCount
<chr> <dbl> <int>
1 Both 1 2
2 Both 1 2
3 Both 1 2
4 Tom 2 1
5 Tina 3 1
6 Tina 3 1
Perform the necessary edits and tidy up.
d %>%
left_join(
d %>%
group_by(ID) %>%
summarize(NameCount=length(unique(Name))),
by="ID"
) %>%
mutate(Name=ifelse(NameCount > 1, "Both", Name)) %>%
select(-NameCount)
# A tibble: 6 x 2
Name ID
<chr> <dbl>
1 Both 1
2 Both 1
3 Both 1
4 Tom 2
5 Tina 3
6 Tina 3
CodePudding user response:
Assuming you named your data frame df
, this could work:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(
name_count = n_distinct(name),
name = if_else(name_count > 1, 'Both', name)
) %>%
# removing new column and ungrouping for completeness
select(-name_count) %>%
ungroup()