Home > Software design >  Change value in one column if distinct in that column but duplicate in another
Change value in one column if distinct in that column but duplicate in another

Time:07-08

I am new on this site, I have tried to find the answer in other threads but haven't been lucky yet.

I have a dataframe in R:

Name ID
Tom 1
Tom 1
Tina 1
Tom 2
Tina 3
Tina 3

What I would like to achieve, is to change the value of "Name" only if they share the same "ID", AND if both names appear within that group. If there is only one "Name" appearing within and "ID" group, "Name" should remain.

The result should thus look something like this:

Name ID
Both 1
Both 1
Both 1
Tom 2
Tina 3
Tina 3

I guess there this should be feasible with dplyr? I have wrapped my head around it but unfortunately couldn't find a way.

Many thanks in advance!

CodePudding user response:

df %>% 
  group_by(ID) %>% 
  mutate(Name = ifelse(n_distinct(Name) >= 2, "Both", Name))

(Where df is:

df <- read.table(text = 
'Name   ID
Tom 1
Tom 1
Tina    1
Tom 2
Tina    3
Tina    3', header = TRUE)

CodePudding user response:

This is pretty straightforward. You just need to break it down into small steps. Here's a tidyverse solution.

library(tidyverse)

d <- tibble(
       Name=c("Tom", "Tom", "Tina", "Tom", "Tina", "Tina"),
       ID=c(1, 1, 1, 2, 3, 3)
     )

First, work out how many names appear for each ID.

d %>% 
  group_by(ID) %>% 
  summarize(NameCount=length(unique(Name)))
# A tibble: 3 x 2
     ID NameCount
  <dbl>     <int>
1     1         2
2     2         1
3     3         1

Add this information to the source tibble

d %>% 
  left_join(
    d %>% 
      group_by(ID) %>% 
      summarize(NameCount=length(unique(Name))),
    by="ID"
  )
# A tibble: 6 x 3
  Name     ID NameCount
  <chr> <dbl>     <int>
1 Both      1         2
2 Both      1         2
3 Both      1         2
4 Tom       2         1
5 Tina      3         1
6 Tina      3         1

Perform the necessary edits and tidy up.

d %>% 
  left_join(
    d %>% 
      group_by(ID) %>% 
      summarize(NameCount=length(unique(Name))),
    by="ID"
  ) %>% 
  mutate(Name=ifelse(NameCount > 1, "Both", Name)) %>% 
  select(-NameCount)
# A tibble: 6 x 2
  Name     ID
  <chr> <dbl>
1 Both      1
2 Both      1
3 Both      1
4 Tom       2
5 Tina      3
6 Tina      3

CodePudding user response:

Assuming you named your data frame df, this could work:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(
    name_count = n_distinct(name), 
    name = if_else(name_count > 1, 'Both', name)
  ) %>%
  # removing new column and ungrouping for completeness
  select(-name_count) %>%
  ungroup()
  • Related