Home > Back-end >  Taking variable names out of column and creating a new column value based on condition
Taking variable names out of column and creating a new column value based on condition

Time:11-17

I have this matrix df with in the first column all the words and in column 2-75 different LIWC categories the words fall into.

A toy example here of what I have:

word posemo certain insight
certainly 1 0 1
obviously 1 1 1
sure 1 0 0
directly 1 0 1
insight 1 1 0
guarantee 0 1 0
prove 1 0 1

This is what I want to achieve:

word posemo certain insight Categories
certainly 1 0 1 posemo, insight
obviously 1 1 1 posemo, certain, insight
sure 1 0 0 posemo
directly 1 0 1 posemo,insight
insight 1 1 0 posemo, certain
guarantee 0 1 0 certain
prove 1 0 1 posemo, insight

I have been looking all over stackoverflow, but I can't seem to find something that applies to what I'm trying to do. This one Taking variable names out of column and creating new columns in R comes close, but doesn't deal with conditions.

Any tips? Thanks in advance

CodePudding user response:

Try using apply:

data.frame( dat, Categories=t(
   t( apply( dat[,2:4], 1, function(x) colnames(dat[,2:4])[as.logical(x)] ) ) ))

       word posemo certain insight               Categories
1 certainly      1       0       1          posemo, insight
2 obviously      1       1       1 posemo, certain, insight
3      sure      1       0       0                   posemo
4  directly      1       0       1          posemo, insight
5   insight      1       1       0          posemo, certain
6 guarantee      0       1       0                  certain
7     prove      1       0       1          posemo, insight

Data

dat <- structure(list(word = c("certainly", "obviously", "sure", "directly", 
"insight", "guarantee", "prove"), posemo = c(1L, 1L, 1L, 1L, 
1L, 0L, 1L), certain = c(0L, 1L, 0L, 0L, 1L, 1L, 0L), insight = c(1L, 
1L, 0L, 1L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA, 
-7L))

EDIT for speed, try pre-defining crucial data

n <- colnames( dat[,2:4] )
lo <- dat[,2:4] == 1

data.frame( dat, Categories=t(t( apply( lo, 1, function(x) n[x] ) ) ))

CodePudding user response:

This does the trick:

Loop over the rows and find which rows are equal to 1 (cols <- x[i,] == 1) then get those column names (cats <- na.omit(colnames(x)[cols])) then paste them together as a single string and replace the value of categories (x$categories[i] <- paste(cats, collapse = ", "))

x <- tibble(word = c("love","hate","sad"),
            happy = c(1,0,0),
            sad = c(0,1,1),
            emotion = c(1,1,1),
            categories = c(NA,NA,NA))

for(i in 1:nrow(x)){
  cols <- x[i,] == 1
  cats <- na.omit(colnames(x)[cols])
  x$categories[i] <- paste(cats, collapse = ", ")
}

  • Related