Taking variable names out of column and creating a new column value based on condition-CodePudding

I have this matrix df with in the first column all the words and in column 2-75 different LIWC categories the words fall into.

A toy example here of what I have:

word	posemo	certain	insight
certainly	1	0	1
obviously	1	1	1
sure	1	0	0
directly	1	0	1
insight	1	1	0
guarantee	0	1	0
prove	1	0	1

This is what I want to achieve:

word	posemo	certain	insight	Categories
certainly	1	0	1	posemo, insight
obviously	1	1	1	posemo, certain, insight
sure	1	0	0	posemo
directly	1	0	1	posemo,insight
insight	1	1	0	posemo, certain
guarantee	0	1	0	certain
prove	1	0	1	posemo, insight

I have been looking all over stackoverflow, but I can't seem to find something that applies to what I'm trying to do. This one Taking variable names out of column and creating new columns in R comes close, but doesn't deal with conditions.

Any tips? Thanks in advance

CodePudding user response：

Try using apply:

data.frame( dat, Categories=t(
   t( apply( dat[,2:4], 1, function(x) colnames(dat[,2:4])[as.logical(x)] ) ) ))

       word posemo certain insight               Categories
1 certainly      1       0       1          posemo, insight
2 obviously      1       1       1 posemo, certain, insight
3      sure      1       0       0                   posemo
4  directly      1       0       1          posemo, insight
5   insight      1       1       0          posemo, certain
6 guarantee      0       1       0                  certain
7     prove      1       0       1          posemo, insight

Data

dat <- structure(list(word = c("certainly", "obviously", "sure", "directly", 
"insight", "guarantee", "prove"), posemo = c(1L, 1L, 1L, 1L, 
1L, 0L, 1L), certain = c(0L, 1L, 0L, 0L, 1L, 1L, 0L), insight = c(1L, 
1L, 0L, 1L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA, 
-7L))

EDIT for speed, try pre-defining crucial data

n <- colnames( dat[,2:4] )
lo <- dat[,2:4] == 1

data.frame( dat, Categories=t(t( apply( lo, 1, function(x) n[x] ) ) ))

CodePudding user response：

This does the trick:

Loop over the rows and find which rows are equal to 1 (cols <- x[i,] == 1) then get those column names (cats <- na.omit(colnames(x)[cols])) then paste them together as a single string and replace the value of categories (x$categories[i] <- paste(cats, collapse = ", "))

x <- tibble(word = c("love","hate","sad"),
            happy = c(1,0,0),
            sad = c(0,1,1),
            emotion = c(1,1,1),
            categories = c(NA,NA,NA))

for(i in 1:nrow(x)){
  cols <- x[i,] == 1
  cats <- na.omit(colnames(x)[cols])
  x$categories[i] <- paste(cats, collapse = ", ")
}