Create vectors based on features of clustering in R-CodePudding

I have a the results of two clusterings and I would like to create vectors so that all features that belong to the cluster are listed in a vector.

The following dataframe results from a clustering algorithm. The columns "C" are the clusters from two different algorithms.

| A1 | A2 | A3 | A4 | A5 | C1 | C2 |
| -- | -- | -- | -- | -- | -- | -- |
| 0  | 0  | 0  | 15 | 0  | 1  | 1  |
| 0  | 20 | 34 | 0  | 0  | 2  | 2  |
| 33 | 0  | 0  | 7  | 0  | 1  | 1  |
| 0  | 0  | 0  | 0  | 85 | 3  | 2  |
| 0  | 0  | 0  | 0  | 94 | 3  | 2  |
| 0  | 12 | 57 | 0  | 0  | 2  | 2  |

I want to create one vector for each cluster so that at the end I have

c11 = ['A1','A4']
c12 = ['A2','A3']
c13 = ['A5']

c21 = ['A1','A4']
c22 = ['A2','A3', 'A5']

EDIT: To be more specific, the code should create a vector for each cluster in this way: If the cluster has a value different from 0 in any of the cluster specific rows for a feature, then add this feature to the vector.

In the first step for the second clustering the algorithm looks at cluster C21 (Rows 1 and 3) according to this rows the features A4 and A1 might be positive in instances of the cluster. In the second step the algorithm looks at the rows 2, 4, 5 and 6 for C22. There the values A2, A3 might be positive (according to the 2nd and 6th row) and the A5 as well (according to the 4th and 5th row)

CodePudding user response：

Create a list of column names for each row, where the value is not equal to 0, by looping across the row with apply and MARGIN = 1, Use the column 'C1', 'C2' to split the list, loop over the outer list and unlist the inner list elements, get the unique and sort it

l1 <- apply(df1[1:5] != 0, 1, FUN = function(x) 
       names(x)[x])
lst1 <-  lapply(split(l1, df1$C1), function(x) sort(unique(unlist(x))))

lst2 <- lapply(split(l1, df1$C2), function(x) sort(unique(unlist(x))))

-output

> lst1
$`1`
[1] "A1" "A4"

$`2`
[1] "A2" "A3"

$`3`
[1] "A5"

> lst2
$`1`
[1] "A1" "A4"

$`2`
[1] "A2" "A3" "A5"