How do I convert a hierarchical dataframe to a list in R?-CodePudding

If I had a hierarchical dataframe like this:

level_1<-c("a","a","a","b","c","c")
level_2<-c("flower","flower","tree","mushroom","dog","cat")
level_3<-c("rose","sunflower","pine",NA,"spaniel",NA)
level_4<-c("pink",NA,NA,NA,NA,NA)
df<-data.frame(level_1,level_2,level_3,level_4)

How do I convert this to a list which orders according to the hierarchy, like this:

> list
 [1] "a"         "flower"    "rose"      "pink"      "sunflower" "tree"      "pine"      "b"         "mushroom"  "c"        
[11] "dog"       "spaniel"   "c"         "cat"

So for in value in level 1, it list all level 2 values expanded across the other levels. Hopefully that makes sense?

Thanks in advance!

CodePudding user response：

In the question "c" appears twice in the desired answer but "a" and "b" only appear once. We assume that this is an error and what is wanted is that each should only appear once.

uniq <- function(x) unique(na.omit(c(t(x))))
unname(unlist(by(df, df$level_1, uniq)))
##  [1] "a"         "flower"    "rose"      "pink"      "sunflower" "tree"     
##  [7] "pine"      "b"         "mushroom"  "c"         "dog"       "spaniel"  
## [13] "cat"

It could also be expressed using pipes:

uniq <- \(x) x |> t() |> c() |> na.omit() |> unique()
by(df, df$level_1, uniq) |> unlist() |> unname()

CodePudding user response：

Convert columnwise duplicated values to NA, then rowwise exclude NAs and unlist.

df[ sapply(df, duplicated) ] <- NA

unlist(apply(df, 1, function(i){ i[ !is.na(i) ]}), use.names = FALSE)
# [1] "a"         "flower"    "rose"      "pink"      "sunflower"
# [6] "tree"      "pine"      "b"         "mushroom"  "c"        
# [11] "dog"       "spaniel"   "cat"

CodePudding user response：

We can try this

> unique(na.omit(c(t(df))))
 [1] "a"         "flower"    "rose"      "pink"      "sunflower" "tree"
 [7] "pine"      "b"         "mushroom"  "c"         "dog"       "spaniel"
[13] "cat"

CodePudding user response：

An alernative way

library(magrittr)

df %>%
  apply(1, function(x) x) %>%
  as.character() %>% 
  {.[!is.na(.)]} %>%
  unique()


# [1] "a"         "flower"    "rose"      "pink"      "sunflower"
# [6] "tree"      "pine"      "b"         "mushroom"  "c"        
# [11] "dog"       "spaniel"   "cat"