Problem
Suppose I have a matrix with each row corresponding to a group and indicator 1 if column belongs to the group and 0 if it doesn't :
> df <- structure(list(Tree = c(1, 0, 0), Cat = c(0, 1, 0), Bird = c(0, 0, 1),
Lion = c(1, 0, 0), Apple = c(0, 0, 1)), class = "data.frame",
row.names = c("row1","row2", "row3"))
> df
Tree Cat Bird Lion Apple
row1 1 0 0 1 0
row2 0 1 0 0 0
row3 0 0 1 0 1
I wish to obtain a list of the 3 different groups, each element of the list indicating the names of the columns in each group :
> Group1 <- c("Tree","Lion")
> Group2 <- c("Cat")
> Group3 <- c("Bird","Apple")
> output <- list(Group1,Group2,Group3)
> output
[[1]]
[1] "Tree" "Lion"
[[2]]
[1] "Cat"
[[3]]
[1] "Bird" "Apple"
I wish to write an R function to automate this on larger scale problem. I am however stuck with subsetting the vector of column names colnames(df).
CodePudding user response:
You can do this easily in base R using apply
to iterate over rows:
apply(df, 1, \(x) names(x)[as.logical(x)])
# $row1
# [1] "Tree" "Lion"
# $row2
# [1] "Cat"
# $row3
# [1] "Bird" "Apple"
Also you can remove rownames
beforehand if that's important:
rownames(df) <- NULL
apply(df, 1, \(x) names(x)[as.logical(x)])
# [[1]]
# [1] "Tree" "Lion"
# [[2]]
# [1] "Cat"
# [[3]]
# [1] "Bird" "Apple"
CodePudding user response:
Another possible solution, based on tidyverse
:
library(tidyverse)
df %>%
rownames_to_column %>%
pivot_longer(-rowname) %>%
filter(value != 0) %>%
group_by(rowname) %>%
summarise(name = list(name)) %>%
select(-rowname) %>%
flatten
#> [[1]]
#> [1] "Tree" "Lion"
#>
#> [[2]]
#> [1] "Cat"
#>
#> [[3]]
#> [1] "Bird" "Apple"
CodePudding user response:
Using split
split(names(df)[col(df)][df == 1], row(df)[df == 1])
-output
$`1`
[1] "Tree" "Lion"
$`2`
[1] "Cat"
$`3`
[1] "Bird" "Apple"