I have a dataset that looks like this:
ID countries income
1 x HIC
1 x HIC
1 y LIC
1 z UIC
2 a HIC
3 y LIC
3 z UIC
I want to filter to keep unique income rows PER ID such that I would eliminate only the repeat income values within each ID, but not across all IDs; thus the data would look like:
ID countries income
1 x HIC
1 y LIC
1 z UIC
2 a HIC
3 y LIC
3 z UIC
the current command I have now is this, but this removes all repeats of income values. What can I tweak so that it's only unique by ID?
dat2 <- dat1 %>% distinct(income, .keep_all = TRUE)
2nd question, if I wanted to further merge the rows such that the above data would look like this, how could I condense the countries and income rows by ID? Could also create a new variable to achieve this?
ID countries income
1 x,y,z HIC,LIC, UIC
2 a HIC
3 y,z LIC,UIC
Thank you in advance!
CodePudding user response:
Another dplyr
option using toString
with unique
like this:
df <- read.table(text = " ID countries income
1 x HIC
1 x HIC
1 y LIC
1 z UIC
2 a HIC
3 y LIC
3 z UIC", header = TRUE)
library(dplyr)
df %>%
group_by(ID) %>%
summarise(across(everything(), ~ toString(unique(.))), .groups = 'drop')
#> # A tibble: 3 × 3
#> ID countries income
#> <int> <chr> <chr>
#> 1 1 x, y, z HIC, LIC, UIC
#> 2 2 a HIC
#> 3 3 y, z LIC, UIC
Created on 2022-07-26 by the reprex package (v2.0.1)
CodePudding user response:
You may try
library(dplyr)
df %>%
group_by(ID) %>%
distinct(income, .keep_all = TRUE) %>%
summarise(across(everything(), ~paste0(.x, collapse = ",")))
ID countries income
<int> <chr> <chr>
1 1 x,y,z HIC,LIC,UIC
2 2 a HIC
3 3 y,z LIC,UIC