I want to transform the four categories below to two new categories: zona_a
contains (north_east & nothern_central)
and zone_b
contains the other two categories. Is there a way to achieve that without going through the hassle of transforming the variable to integer and using the ifelse
function?
library(plm)
data("Males")
table(Males$residence)
rural_area north_east nothern_central south
85 733 964 1333
CodePudding user response:
Here you have one tidyverse
solution, hope that helps:
library(tidyverse)
Males <- Males %>%
mutate(residence = factor(case_when(residence %in% c("north_east", "nothern_central") ~ "zone_a",
residence %in% c("rural_area", "south") ~ "zone_b")))
CodePudding user response:
The levels()
function is one way to approach this since it allows you to set new factor levels. You can also do something similar with the labels
argument in factor()
(not shown).
If using levels()
you have to take care to set the new levels based on the current order so I always take a look at them first.
Here's an example:
# Check current levels
levels(Males$residence)
#> [1] "rural_area" "north_east" "nothern_central" "south"
# Set new levels in correct order
levels(Males$residence) = c("zone_b", "zone_a", "zone_a", "zone_b")
# Check that this worked
table(Males$residence)
#>
#> zone_b zone_a
#> 1418 1697
A "safer" method, where you explicitly have to pair the old and new values, can be done via package forcats using fct_collapse()
. (Thanks to @camille for pointing this functions over fct_recode()
.)
library(forcats)
data(Males)
Males$residence = fct_collapse(Males$residence,
zone_a = c("north_east", "nothern_central"),
zone_b = c("rural_area", "south")
)
table(Males$residence)
#>
#> zone_b zone_a
#> 1418 1697
Created on 2021-11-02 by the reprex package (v2.0.1)