Filter for unique rows by ID-CodePudding

I have a dataset that looks like this:

 ID           countries        income
1                x                 HIC
1                x                 HIC
1                y                 LIC
1                z                 UIC
2                a                 HIC
3                y                 LIC
3                z                 UIC

I want to filter to keep unique income rows PER ID such that I would eliminate only the repeat income values within each ID, but not across all IDs; thus the data would look like:

    ID           countries        income
    1                x                 HIC
    1                y                 LIC
    1                z                 UIC
    2                a                 HIC
    3                y                 LIC
    3                z                 UIC

the current command I have now is this, but this removes all repeats of income values. What can I tweak so that it's only unique by ID?

dat2 <- dat1 %>% distinct(income, .keep_all = TRUE)

2nd question, if I wanted to further merge the rows such that the above data would look like this, how could I condense the countries and income rows by ID? Could also create a new variable to achieve this?

    ID           countries        income
    1                x,y,z         HIC,LIC, UIC
    2                a             HIC
    3                y,z           LIC,UIC

Thank you in advance!

CodePudding user response：

Another dplyr option using toString with unique like this:

df <- read.table(text = " ID           countries        income
1                x                 HIC
1                x                 HIC
1                y                 LIC
1                z                 UIC
2                a                 HIC
3                y                 LIC
3                z                 UIC", header = TRUE)

library(dplyr)
df %>%
  group_by(ID) %>%
  summarise(across(everything(), ~ toString(unique(.))), .groups = 'drop') 
#> # A tibble: 3 × 3
#>      ID countries income       
#>   <int> <chr>     <chr>        
#> 1     1 x, y, z   HIC, LIC, UIC
#> 2     2 a         HIC          
#> 3     3 y, z      LIC, UIC

^{Created on 2022-07-26 by the reprex package (v2.0.1)}

CodePudding user response：

You may try

library(dplyr)
df %>%
  group_by(ID) %>%
  distinct(income, .keep_all = TRUE) %>%
  summarise(across(everything(), ~paste0(.x, collapse = ",")))


     ID countries income     
  <int> <chr>     <chr>      
1     1 x,y,z     HIC,LIC,UIC
2     2 a         HIC        
3     3 y,z       LIC,UIC