how to identify whether all data frame in a list has unique ID or not-CodePudding

I have a list of dfs. I want to know whether there is a smart way to tell whether each df in lst has unique ID, and create a summary table like below"

Sample data:

lst<-list(structure(list(ID = c("Tom", "Jerry", "Mary"), Score = c(85, 
85, 96)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame")), structure(list(ID = c("Tom", "Jerry", "Mary", 
"Jerry"), Score = c(75, 65, 88, 98)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(ID = c("Tom", "Jerry", 
"Tom"), Score = c(97, 65, 96)), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame")))

CodePudding user response：

We could loop over the list and check with n_distinct

library(dplyr)
library(stringr)
library(purrr)
map_dfr(setNames(lst, str_c("df", seq_along(lst))), 
   ~.x %>% 
   summarise(UniqueID = c("N", "Y")[1   (n_distinct(ID) == n())]), .id= 'Data')

-output

# A tibble: 3 × 2
  Data  UniqueID
  <chr> <chr>   
1 df1   Y       
2 df2   N       
3 df3   N

CodePudding user response：

In base R:

data.frame(Data = paste0("df", seq(lst)),
           UniqueID = ifelse(sapply(lst, \(x) length(unique(x$ID)) == nrow(x)), "Y", "N"))

  Data UniqueID
1  df1        Y
2  df2        N
3  df3        N