I have a data frame with a column called ‘full_name’ that presents two teams, for example: • ‘Man U to win Liverpool to win’ • ‘Liverpool to win Man U to win’ • ‘Chelsea to win Arsenal to win’ And so on…
I would like to be able to differentiate the teams into North and South, so that if ‘Man U to win Liverpool to win’ or ‘Liverpool to win Man U to win’ are presented, then this is coded as ‘North’, whereas if ‘Chelsea to win Arsenal to win’ is presented, this is coded as ‘South’, and so on.
levels(raw_data$full_name)[levels(raw_data$full_name)== "Man U to win Liverpool to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Liverpool to win Man U to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Chelsea to win Arsenal to win"] <- 'South'
The code above does not produce any error, however the dataframe remains unchanged, and there is not producing the desired output. Is a way to do this?
CodePudding user response:
Here is an alternative approach:
library(dplyr)
library(stringr)
north <- c("Liverpool|Man")
south <- c("Chelsea|Arsenal")
df %>%
mutate(region = case_when(str_detect(full_name, north) ~ "North",
str_detect(full_name, south) ~ "South",
TRUE ~ NA_character_))
full_name region
1 Liverpool to win Man U to win North
2 Chelsea to win Arsenal to win South
3 Man U to win Liverpool to win North
4 Chelsea to win Arsenal to win South
5 Liverpool to win Man U to win North
CodePudding user response:
Here an example with a tidyverse approach that might help you
library(dplyr)
north <- c("Man U to win Liverpool to win","Liverpool to win Man U to win")
south <- c("Chelsea to win Arsenal to win")
df <-
data.frame(full_name = sample(c(north,south),size = 5,replace = TRUE))
df %>%
mutate(region = case_when(
full_name %in% north ~ "North",
full_name %in% south ~ "South"
))
full_name region
1 Chelsea to win Arsenal to win South
2 Man U to win Liverpool to win North
3 Chelsea to win Arsenal to win South
4 Man U to win Liverpool to win North
5 Man U to win Liverpool to win North
CodePudding user response:
In base R, your code will work as intended if you remove the levels()
calls. You can call factor()
after replacing values if you want the column to be a factor.
# example data
raw_data <- data.frame(full_name = c(
"Man U to win Liverpool to win",
"Liverpool to win Man U to win",
"Chelsea to win Arsenal to win"
))
raw_data$full_name[raw_data$full_name == "Man U to win Liverpool to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Liverpool to win Man U to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Chelsea to win Arsenal to win"] <- "South"
raw_data$full_name <- factor(raw_data$full_name)
Alternatively, you can use a named vector as a lookup table:
lookup <- c(
"Man U to win Liverpool to win" = "North",
"Liverpool to win Man U to win" = "North",
"Chelsea to win Arsenal to win" = "South"
)
raw_data$full_name <- factor(lookup[raw_data$full_name])
Result from either approach:
#> raw_data
full_name
1 North
2 North
3 South
#> levels(raw_data$full_name)
[1] "North" "South"
CodePudding user response:
Here is an option with fct_recode
library(forcats)
raw_data$full_name <- with(raw_data, fct_recode(full_name,
North = "Man U to win Liverpool to win",
North = "Liverpool to win Man U to win",
South = "Chelsea to win Arsenal to win"))
Or using base R
factor(raw_data$full_name, levels = c("Chelsea to win Arsenal to win",
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), labels = c("South", "North", "North"))
Or if we want to use levels
lvls_to_change <- c("Man U to win Liverpool to win",
"Liverpool to win Man U to win", "Chelsea to win Arsenal to win")
lvsl_new <- c("North", "North", "South")
i1 <- levels(raw_data$full_name) %in% lvls_to_change
levels(raw_data$full_name)[i1] <- lvsl_new[match(levels(raw_data$full_name)[i1], lvls_to_change)]
data
raw_data <- structure(list(full_name = structure(c(2L, 2L, 3L, 2L,
1L), levels = c("Chelsea to win Arsenal to win",
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), class = "factor")), row.names = c(NA, -5L), class = "data.frame")