Home > Net >  Creating new column based on character values in R
Creating new column based on character values in R

Time:11-28

I have a data frame with a column called ‘full_name’ that presents two teams, for example: • ‘Man U to win Liverpool to win’ • ‘Liverpool to win Man U to win’ • ‘Chelsea to win Arsenal to win’ And so on…

I would like to be able to differentiate the teams into North and South, so that if ‘Man U to win Liverpool to win’ or ‘Liverpool to win Man U to win’ are presented, then this is coded as ‘North’, whereas if ‘Chelsea to win Arsenal to win’ is presented, this is coded as ‘South’, and so on.

levels(raw_data$full_name)[levels(raw_data$full_name)== "Man U to win Liverpool to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Liverpool to win Man U to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Chelsea to win Arsenal to win"] <- 'South'

The code above does not produce any error, however the dataframe remains unchanged, and there is not producing the desired output. Is a way to do this?

CodePudding user response:

Here is an alternative approach:

library(dplyr)
library(stringr)

north <- c("Liverpool|Man")
south <- c("Chelsea|Arsenal")

df %>% 
  mutate(region = case_when(str_detect(full_name, north) ~ "North",
                            str_detect(full_name, south) ~ "South",
                            TRUE ~ NA_character_))
                      full_name region
1 Liverpool to win Man U to win  North
2 Chelsea to win Arsenal to win  South
3 Man U to win Liverpool to win  North
4 Chelsea to win Arsenal to win  South
5 Liverpool to win Man U to win  North

CodePudding user response:

Here an example with a tidyverse approach that might help you

library(dplyr)

north <- c("Man U to win Liverpool to win","Liverpool to win Man U to win")
south <- c("Chelsea to win Arsenal to win")


df <- 
  data.frame(full_name = sample(c(north,south),size = 5,replace = TRUE))
             
df %>% 
  mutate(region = case_when(
    full_name %in% north ~ "North",
    full_name %in% south ~ "South"
  ))

                      full_name region
1 Chelsea to win Arsenal to win  South
2 Man U to win Liverpool to win  North
3 Chelsea to win Arsenal to win  South
4 Man U to win Liverpool to win  North
5 Man U to win Liverpool to win  North

CodePudding user response:

In base R, your code will work as intended if you remove the levels() calls. You can call factor() after replacing values if you want the column to be a factor.

# example data
raw_data <- data.frame(full_name = c(
  "Man U to win Liverpool to win", 
  "Liverpool to win Man U to win",
  "Chelsea to win Arsenal to win"
))

raw_data$full_name[raw_data$full_name == "Man U to win Liverpool to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Liverpool to win Man U to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Chelsea to win Arsenal to win"] <- "South"

raw_data$full_name <- factor(raw_data$full_name)

Alternatively, you can use a named vector as a lookup table:

lookup <- c(
  "Man U to win Liverpool to win" = "North",
  "Liverpool to win Man U to win" = "North",
  "Chelsea to win Arsenal to win" = "South"
)

raw_data$full_name <- factor(lookup[raw_data$full_name])

Result from either approach:

#> raw_data
  full_name
1     North
2     North
3     South

#> levels(raw_data$full_name)
[1] "North" "South"

CodePudding user response:

Here is an option with fct_recode

library(forcats)
raw_data$full_name <- with(raw_data, fct_recode(full_name, 
   North =  "Man U to win Liverpool to win",
   North = "Liverpool to win Man U to win",
   South  =  "Chelsea to win Arsenal to win"))

Or using base R

factor(raw_data$full_name, levels = c("Chelsea to win Arsenal to win", 
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), labels = c("South", "North", "North"))

Or if we want to use levels

lvls_to_change <-  c("Man U to win Liverpool to win",
   "Liverpool to win Man U to win", "Chelsea to win Arsenal to win")
lvsl_new <- c("North", "North", "South")
i1 <- levels(raw_data$full_name) %in% lvls_to_change
levels(raw_data$full_name)[i1] <- lvsl_new[match(levels(raw_data$full_name)[i1], lvls_to_change)]

data

raw_data <- structure(list(full_name = structure(c(2L, 2L, 3L, 2L,
 1L), levels = c("Chelsea to win Arsenal to win", 
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), class = "factor")), row.names = c(NA, -5L), class = "data.frame")
  • Related