Home > Net >  For each item of a list only keep the unique strings
For each item of a list only keep the unique strings

Time:11-05

I have example data as follows:

df_list = list()

df_list[[1]] <- c("A", NA, "A", "Ab", "Ac", NA, NA, "AA")
df_list[[2]] <- c(NA, "A", NA, NA, "AA", NA)
df_list[[3]] <- c("AA", "Ac", "Ad", NA, NA, NA, "Af", NA)
df_list[[4]] <- c(NA, NA, "AA", "Ac", "Ad", "AA", NA)
df_list[[5]] <- c(NA, "Ae", NA, "Ad", "Af", NA, "AA", NA)

names(df_list)[1] <- "nr1"
names(df_list)[2] <- "nr2"
names(df_list)[3] <- "nr3"
names(df_list)[4] <- "nr4"
names(df_list)[5] <- "nr5"

Which would as a df look something like this (notice the different lengths):

# A tibble: 8 x 5
  nr1   nr2   nr3   nr4   nr5  
  <chr> <chr> <chr> <chr> <chr>
1 A     NA    AA    NA    NA   
2 NA    A     Ac    NA    Ae   
3 A     NA    Ad    AA    NA   
4 Ab    NA    NA    Ac    Ad   
5 Ac    AA    NA    Ad    Af   
6 NA    NA    NA    AA    NA   
7 NA          Af    NA    AA   
8 AA          NA          NA 

For each list item, I would like to keep only the unique strings.

I have been wrecking my brain on how to do this, but I am not sure what to do.

Desired output (in df form):

# A tibble: 8 x 5
  nr1   nr2   nr3   nr4   nr5  
  <chr> <chr> <chr> <chr> <chr>
1 A     A     AA    AA    Ae   
2 Ab    AA    Ac    Ac    Ad   
3 Ac          Ad    Ad    Af   
4 AA          Af          AA                 

In list form:

df_list = list()
df_list[[1]] <- c("A", "Ab", "Ac", "AA")
df_list[[2]] <- c("A","AA")
df_list[[3]] <- c("AA", "Ac", "Ad","Af")
df_list[[4]] <- c("AA", "Ac", "Ad")
df_list[[5]] <- c("Ae", "Ad", "Af", "AA")
names(df_list)[1] <- "nr1"
names(df_list)[2] <- "nr2"
names(df_list)[3] <- "nr3"
names(df_list)[4] <- "nr4"
names(df_list)[5] <- "nr5"

CodePudding user response:

Here is a purrr solution

library(purrr)
df_list %>%
    map(~ unique(.x[!is.na(.x)])) %>%
    map_dfc(., function(w) replace(character(max(lengths(.))), 1:length(w), w))
## A tibble: 4 x 5
#  nr1   nr2   nr3   nr4   nr5
#  <chr> <chr> <chr> <chr> <chr>
#1 A     "A"   AA    "AA"  Ae
#2 Ab    "AA"  Ac    "Ac"  Ad
#3 Ac    ""    Ad    "Ad"  Af
#4 AA    ""    Af    ""    AA

The idea is to first remove all NA and duplicate entries, then we column-bind all padded list elements into a tibble. You could shorten this into one map_dfc call, but this version helps with readability.

CodePudding user response:

A solution

tmp=lapply(df_list,function(x){unique(na.omit(x))})
asd=max(unlist(lapply(tmp,length)))
do.call(cbind,lapply(tmp,function(x){length(x)=asd;x}))

     nr1  nr2  nr3  nr4  nr5 
[1,] "A"  "A"  "AA" "AA" "Ae"
[2,] "Ab" "AA" "Ac" "Ac" "Ad"
[3,] "Ac" NA   "Ad" "Ad" "Af"
[4,] "AA" NA   "Af" NA   "AA"

CodePudding user response:

We can use purrr::map, discard, is.na, and unique.

answer

library(purrr)

df_list %>% map(discard, is.na)%>%
            map(unique)

output

$nr1
[1] "A"  "Ab" "Ac" "AA"

$nr2
[1] "A"  "AA"

$nr3
[1] "AA" "Ac" "Ad" "Af"

$nr4
[1] "AA" "Ac" "Ad"

$nr5
[1] "Ae" "Ad" "Af" "AA"

CodePudding user response:

l <- sapply(df_list, function(x) unique(na.omit(x)))

sapply(l, function(x) {length(x) <- max(lengths(l)); x}) %>% 
  as_tibble()

# A tibble: 4 x 5
  nr1   nr2   nr3   nr4   nr5  
  <chr> <chr> <chr> <chr> <chr>
1 A     A     AA    AA    Ae   
2 Ab    AA    Ac    Ac    Ad   
3 Ac    NA    Ad    Ad    Af   
4 AA    NA    Af    NA    AA 
  • Related