I have this matrix df with in the first column all the words and in column 2-75 different LIWC categories the words fall into.
A toy example here of what I have:
word | posemo | certain | insight |
---|---|---|---|
certainly | 1 | 0 | 1 |
obviously | 1 | 1 | 1 |
sure | 1 | 0 | 0 |
directly | 1 | 0 | 1 |
insight | 1 | 1 | 0 |
guarantee | 0 | 1 | 0 |
prove | 1 | 0 | 1 |
This is what I want to achieve:
word | posemo | certain | insight | Categories |
---|---|---|---|---|
certainly | 1 | 0 | 1 | posemo, insight |
obviously | 1 | 1 | 1 | posemo, certain, insight |
sure | 1 | 0 | 0 | posemo |
directly | 1 | 0 | 1 | posemo,insight |
insight | 1 | 1 | 0 | posemo, certain |
guarantee | 0 | 1 | 0 | certain |
prove | 1 | 0 | 1 | posemo, insight |
I have been looking all over stackoverflow, but I can't seem to find something that applies to what I'm trying to do. This one Taking variable names out of column and creating new columns in R comes close, but doesn't deal with conditions.
Any tips? Thanks in advance
CodePudding user response:
Try using apply
:
data.frame( dat, Categories=t(
t( apply( dat[,2:4], 1, function(x) colnames(dat[,2:4])[as.logical(x)] ) ) ))
word posemo certain insight Categories
1 certainly 1 0 1 posemo, insight
2 obviously 1 1 1 posemo, certain, insight
3 sure 1 0 0 posemo
4 directly 1 0 1 posemo, insight
5 insight 1 1 0 posemo, certain
6 guarantee 0 1 0 certain
7 prove 1 0 1 posemo, insight
Data
dat <- structure(list(word = c("certainly", "obviously", "sure", "directly",
"insight", "guarantee", "prove"), posemo = c(1L, 1L, 1L, 1L,
1L, 0L, 1L), certain = c(0L, 1L, 0L, 0L, 1L, 1L, 0L), insight = c(1L,
1L, 0L, 1L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA,
-7L))
EDIT for speed, try pre-defining crucial data
n <- colnames( dat[,2:4] )
lo <- dat[,2:4] == 1
data.frame( dat, Categories=t(t( apply( lo, 1, function(x) n[x] ) ) ))
CodePudding user response:
This does the trick:
Loop over the rows and find which rows are equal to 1 (cols <- x[i,] == 1
) then get those column names (cats <- na.omit(colnames(x)[cols])
) then paste them together as a single string and replace the value of categories
(x$categories[i] <- paste(cats, collapse = ", ")
)
x <- tibble(word = c("love","hate","sad"),
happy = c(1,0,0),
sad = c(0,1,1),
emotion = c(1,1,1),
categories = c(NA,NA,NA))
for(i in 1:nrow(x)){
cols <- x[i,] == 1
cats <- na.omit(colnames(x)[cols])
x$categories[i] <- paste(cats, collapse = ", ")
}