I have example data as follows:
df_list = list()
df_list[[1]] <- c("A", NA, "A", "Ab", "Ac", NA, NA, "AA")
df_list[[2]] <- c(NA, "A", NA, NA, "AA", NA)
df_list[[3]] <- c("AA", "Ac", "Ad", NA, NA, NA, "Af", NA)
df_list[[4]] <- c(NA, NA, "AA", "Ac", "Ad", "AA", NA)
df_list[[5]] <- c(NA, "Ae", NA, "Ad", "Af", NA, "AA", NA)
names(df_list)[1] <- "nr1"
names(df_list)[2] <- "nr2"
names(df_list)[3] <- "nr3"
names(df_list)[4] <- "nr4"
names(df_list)[5] <- "nr5"
Which would as a df look something like this (notice the different lengths):
# A tibble: 8 x 5
nr1 nr2 nr3 nr4 nr5
<chr> <chr> <chr> <chr> <chr>
1 A NA AA NA NA
2 NA A Ac NA Ae
3 A NA Ad AA NA
4 Ab NA NA Ac Ad
5 Ac AA NA Ad Af
6 NA NA NA AA NA
7 NA Af NA AA
8 AA NA NA
For each list item, I would like to keep only the unique strings.
I have been wrecking my brain on how to do this, but I am not sure what to do.
Desired output (in df form):
# A tibble: 8 x 5
nr1 nr2 nr3 nr4 nr5
<chr> <chr> <chr> <chr> <chr>
1 A A AA AA Ae
2 Ab AA Ac Ac Ad
3 Ac Ad Ad Af
4 AA Af AA
In list form:
df_list = list()
df_list[[1]] <- c("A", "Ab", "Ac", "AA")
df_list[[2]] <- c("A","AA")
df_list[[3]] <- c("AA", "Ac", "Ad","Af")
df_list[[4]] <- c("AA", "Ac", "Ad")
df_list[[5]] <- c("Ae", "Ad", "Af", "AA")
names(df_list)[1] <- "nr1"
names(df_list)[2] <- "nr2"
names(df_list)[3] <- "nr3"
names(df_list)[4] <- "nr4"
names(df_list)[5] <- "nr5"
CodePudding user response:
Here is a purrr
solution
library(purrr)
df_list %>%
map(~ unique(.x[!is.na(.x)])) %>%
map_dfc(., function(w) replace(character(max(lengths(.))), 1:length(w), w))
## A tibble: 4 x 5
# nr1 nr2 nr3 nr4 nr5
# <chr> <chr> <chr> <chr> <chr>
#1 A "A" AA "AA" Ae
#2 Ab "AA" Ac "Ac" Ad
#3 Ac "" Ad "Ad" Af
#4 AA "" Af "" AA
The idea is to first remove all NA and duplicate entries, then we column-bind all padded list
elements into a tibble
. You could shorten this into one map_dfc
call, but this version helps with readability.
CodePudding user response:
A solution
tmp=lapply(df_list,function(x){unique(na.omit(x))})
asd=max(unlist(lapply(tmp,length)))
do.call(cbind,lapply(tmp,function(x){length(x)=asd;x}))
nr1 nr2 nr3 nr4 nr5
[1,] "A" "A" "AA" "AA" "Ae"
[2,] "Ab" "AA" "Ac" "Ac" "Ad"
[3,] "Ac" NA "Ad" "Ad" "Af"
[4,] "AA" NA "Af" NA "AA"
CodePudding user response:
We can use purrr::map
, discard
, is.na
, and unique
.
answer
library(purrr)
df_list %>% map(discard, is.na)%>%
map(unique)
output
$nr1
[1] "A" "Ab" "Ac" "AA"
$nr2
[1] "A" "AA"
$nr3
[1] "AA" "Ac" "Ad" "Af"
$nr4
[1] "AA" "Ac" "Ad"
$nr5
[1] "Ae" "Ad" "Af" "AA"
CodePudding user response:
l <- sapply(df_list, function(x) unique(na.omit(x)))
sapply(l, function(x) {length(x) <- max(lengths(l)); x}) %>%
as_tibble()
# A tibble: 4 x 5
nr1 nr2 nr3 nr4 nr5
<chr> <chr> <chr> <chr> <chr>
1 A A AA AA Ae
2 Ab AA Ac Ac Ad
3 Ac NA Ad Ad Af
4 AA NA Af NA AA