Problem

Suppose I have a matrix with each row corresponding to a group and indicator 1 if column belongs to the group and 0 if it doesn't :

> df <- structure(list(Tree = c(1, 0, 0), Cat = c(0, 1, 0), Bird = c(0, 0, 1), 
                       Lion = c(1, 0, 0), Apple = c(0, 0, 1)), class = "data.frame",
                  row.names = c("row1","row2", "row3"))
> df
     Tree Cat Bird Lion Apple
row1    1   0    0    1     0
row2    0   1    0    0     0
row3    0   0    1    0     1

I wish to obtain a list of the 3 different groups, each element of the list indicating the names of the columns in each group :

> Group1 <- c("Tree","Lion")
> Group2 <- c("Cat")
> Group3 <- c("Bird","Apple")
> output <- list(Group1,Group2,Group3)
> output
[[1]]
[1] "Tree" "Lion"

[[2]]
[1] "Cat"

[[3]]
[1] "Bird"  "Apple"

I wish to write an R function to automate this on larger scale problem. I am however stuck with subsetting the vector of column names colnames(df).

CodePudding user response：

You can do this easily in base R using apply to iterate over rows:

apply(df, 1, \(x) names(x)[as.logical(x)])
# $row1
# [1] "Tree" "Lion"

# $row2
# [1] "Cat"

# $row3
# [1] "Bird"  "Apple"

Also you can remove rownames beforehand if that's important:

rownames(df) <- NULL
apply(df, 1, \(x) names(x)[as.logical(x)])
# [[1]]
# [1] "Tree" "Lion"

# [[2]]
# [1] "Cat"

# [[3]]
# [1] "Bird"  "Apple"

CodePudding user response：

Another possible solution, based on tidyverse:

library(tidyverse)

df %>% 
  rownames_to_column %>% 
  pivot_longer(-rowname) %>% 
  filter(value != 0) %>% 
  group_by(rowname) %>% 
  summarise(name = list(name)) %>% 
  select(-rowname) %>% 
  flatten

#> [[1]]
#> [1] "Tree" "Lion"
#> 
#> [[2]]
#> [1] "Cat"
#> 
#> [[3]]
#> [1] "Bird"  "Apple"

CodePudding user response：

Using split

split(names(df)[col(df)][df == 1], row(df)[df == 1])

-output

$`1`
[1] "Tree" "Lion"

$`2`
[1] "Cat"

$`3`
[1] "Bird"  "Apple"