Home > front end >  Changing the names of dummy variables in R?
Changing the names of dummy variables in R?

Time:08-15

I recently asked a similar question here. However, I have a new problem and I cant apply the other answer. I am trying to change the names of column elements based on a vector of names.

The issue I'm having is, if I have a data frame with factor variables like so:

dfOG <- data.frame(
  x1 = c(1,4,2),
  x4 = as.factor(c('yes', 'no', 'maybe')),
  x5 = as.factor(c(rep('yes', 2), rep('no', 1))),
  x41 = as.factor(c(rep('hot', 1), rep('cold', 2)))
)

I am then apply one of three process. Each process (which is outside of my control) will split those factors into dummy variables. But unfortunately, depending on the process used, it renames the dummies differently. For example, the table below shows how x4 could be split, depending on the process used:

x41, x42, x43 = x4 # process 1
x4.yes, x4.no, x4.maybe = x4 # process 2
x4_yes, x4_no, x4_maybe = x4 # process 3

So, process 1 just adds a number to the end. Process 2 adds a . followed by the factor level. Process 3 adds a _ followed by the factor level.

Focusing just on process 1, after applying the process I end up with a data frame with a column that looks like this:

dfChange <- data.frame(
  name = c('x41', 'x43', NA, NA,
           'x411', NA, 'x52',
           'x42', 'x412', NA)
)

I'm trying to change the names of the elements in dfChange$name back to their original names in dfOG.

Desired Output

My desired output would look something like this:

dfDesired <- data.frame(
  name = c('x4', 'x4', NA, NA, 
           'x41', NA, 'x5',
           'x4', 'x41', NA)
)

Attempted Solution

I don't actually have a solution yet, but here's what I was trying:

# find out which columns in my original data are factors
factorColNam <- names(which(!(sapply(dfOG[colnames(dfOG)], is.numeric))))
factorCols <- which((colnames(dfOG) %in% factorColNam))


# create a list of the variables split into their factors
dfnew <- list()
for (i in 1:length(factorCols)) {
  facLevels <- unique(dfOG[ ,factorCols[i]])
  dfnew[[i]] <- paste0(factorColNam[i],  as.numeric(facLevels))
}

# rename each element it's original name
names(dfnew) <- factorColNam

My idea was to then loop through the list elements and compare the names in dfChange$name and rename them to whatever list element they are in. But I'm sure there has got to be an easier and better way to do this?

EDIT

Something I forgot to mention is that the names can be anything, they are not limited to x1, x2,... etc. So, for example, I could have a variable called x10 with 10 factor levels, or a variable called banana with 25 factor levels. Any name is possible.

CodePudding user response:

Continuing from your code:

dfChange$var <- NA
for (i in 1:length(dfnew)) {
  nn <- names(dfnew)
  dfChange$var[which(dfChange$name %in% dfnew[[i]])] <- nn[[i]]
}
  •  Tags:  
  • r
  • Related