Home > Software engineering >  How to get R to create new column (named from left part of string in old column), and then put right
How to get R to create new column (named from left part of string in old column), and then put right

Time:04-13

Given an existing dataframe containing a character column such as that shown below (oldColumn1), I want to have R automatically create a new column, in the same data frame, named from the left part of the string (e.g. COLOR).

Then for each row put the right part of the string contents appearing after the ": " (e.g. RED, BLUE, ETC) into the new column named "COLOR".

There are many old columns (oldColumn1, oldColumn2, etc) that need to be split out like this so doing this manually is impractical. Thanks in advance for any help you might provide.

# Here is an example of 3 oldColumns that already exist in dataframe.
# There are thousands of these columns, need to auto create a new
# column for each one as described.
# Maybe hoping to have the oldColumn names in a vector, to then pass
# to a function that creates a new column for each oldColumn. 

oldColumn1 <- c('COLOR: RED', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE')
oldColumn2 <- c('SIZE: LARGE', 'SIZE: MEDIUM','SIZE: XLARGE','SIZE: MEDIUM','SIZE: SMALL')
oldColumn3 <- c('DESIGNSTYLE: STYLED', 'DESIGNSTYLE: ORIGINAL MAKER', 'DESIGNSTYLE: COUTURE','DESIGNSTYLE: COUTURE','DESIGNSTYLE: STYLED')
COLOR <- c('RED', 'RED', 'BLUE', 'GREEN', 'BLUE')
SIZE <- c('LARGE', 'MEDIUM', 'XLARGE', 'MEDIUM', 'SMALL')
DESIGNSTYLE <- c('STYLED', 'ORIGINAL MAKER', 'COUTURE', 'COUTURE', 'STYLED')
dat <- data.frame(oldColumn1, oldColumn2, oldColumn3, COLOR, SIZE, DESIGNSTYLE)
dat

CodePudding user response:

You can create a new column by using $ and then use gsub() to remove COLOR: from the targeted column.

yourdf$COLOR <- gsub("COLOR: ", "", yourdf$oldColumn1)

If you also want to delete the old column:

yourdf$oldColumn1 <- NULL

EDIT

In case you have many columns, you can apply gsub function to all targeted columns. If your targeted columns have a common name pattern, such as oldColumn as in your example, you can subset your data frame by identifying that pattern by using grep. After that, you can rename the edited columns to become COLOR1, COLOR2, etc.

Here are the complete steps:

# Remove "COLOR: " from the targeted columns
colname_pattern <- grep("oldColumn", colnames(yourdf))
yourdf[, colname_pattern] <- apply(yourdf[, colname_pattern], 2, 
                                   gsub, pattern = "COLOR: ", 
                                   replacement = "")
# Rename the edited columns
index <- seq_along(colname_pattern)
newnames <- paste0("COLOR", index)
colnames(yourdf[, colname_pattern]) <- newnames

CodePudding user response:

Starting with

quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE")), class = "data.frame", row.names = c(NA, -5L))

The naive approach would be

data.frame(COLOR = trimws(sub("COLOR:", "", quux$oldColumn1)))
#   COLOR
# 1   RED
# 2   RED
# 3  BLUE
# 4 GREEN
# 5  BLUE

But I'm assuming you have a more generic need. Let's assume that you have some more things to parse out of that, such as

quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE", "SIZE: 1", "SIZE: 3", "SIZE: 5")), class = "data.frame", row.names = c(NA, -8L))
quux
#     oldColumn1
# 1   COLOR: RED
# 2   COLOR: RED
# 3  COLOR: BLUE
# 4 COLOR: GREEN
# 5  COLOR: BLUE
# 6      SIZE: 1
# 7      SIZE: 3
# 8      SIZE: 5

then we can generalize it with

tmp <- strcapture("(.*)\\s*:\\s*(.*)", quux$oldColumn1, list(k="", v=""))
tmp$ign <- ave(rep(1L, nrow(tmp)), tmp$k, FUN = seq_along)
reshape2::dcast(tmp, ign ~ k, value.var = "v")[,-1,drop=FALSE]
#   COLOR SIZE
# 1   RED    1
# 2   RED    3
# 3  BLUE    5
# 4 GREEN <NA>
# 5  BLUE <NA>
  • Related