I want to group my levels based on the mean price of each group, is this the right way to do it?
ames.train.c <- ames.train.c %>%
group_by(Neighborhood) %>%
mutate(Neighborhood.Cat = ifelse(mean(price) < 140000, "A",
ifelse(mean(price) < 200000, "B",
ifelse(mean(price) < 260000, "C",
ifelse(mean(price) < 300000, "D",
ifelse(mean(price) < 340000, "E"))))))
CodePudding user response:
I think this approach might help you
library(dplyr)
cut_breaks <- c(0,140000,200000,260000,300000,340000)
cut_labels <- c("A","B","C","D","E")
ames.train.c %>%
group_by(Neighborhood) %>%
mutate(Neighborhood.Cat = cut(mean(price),cut_breaks,labels = cut_labels))
CodePudding user response:
You didn't give us the data so I had to prepare it myself.
library(tidyverse)
df = tibble(
Neighborhood = rep(1:5, each=1000),
price = c(rnorm(1000, 100000, 1000),
rnorm(1000, 150000, 1000),
rnorm(1000, 90000, 1000),
rnorm(1000, 200000, 1000),
rnorm(1000, 300000, 1000))
)
Now we will create a function for assigning categories.
f = function(data) data %>% mutate(
Neighborhood.Cat =
case_when(
mean(price) < 140000 ~ "A",
mean(price) < 200000 ~ "B",
mean(price) < 260000 ~ "C",
mean(price) < 300000 ~ "D",
mean(price) < 340000 ~ "E"
))
With this function, you can modify groups in the following way:
df = df %>% group_by(Neighborhood) %>%
group_modify(~f(.x))
Let's check the effect
df %>% group_by(Neighborhood) %>%
summarise(mean = mean(price),
Cat = Neighborhood.Cat[1])
output
# A tibble: 5 x 3
Neighborhood mean Cat
<int> <dbl> <chr>
1 1 100020. A
2 2 150011. B
3 3 89981. A
4 4 200052. C
5 5 299998. D