I'm having some difficulty in producing a new feature for a given set of conditions.
Taking the 'iris' data set as a reproducible example I would like to find the maximum petal length for each species and create a new column, then assign the term 'highest_length' wherever the highest has been identified - I only want to display this on the row where this is applicable.
I'm missing something seemingly obvious to others but can't bridge the gap myself. Would really welcome any pointers.
The code I've tried is as per below:
iris%>% mutate(high_spec = case_when(distinct(Species) & max(Petal.Length) ~ 'high_length'))
I get the following error:
Error: Problem with mutate()
column high_spec
. ℹ high_spec = case_when(distinct(Species) & max(Petal.Length) ~ "high_length")
. x no applicable method for 'distinct' applied to an object of class "factor"
I've tried changing species to a character vector but this did not work as intended either.
Thanks
CodePudding user response:
Following up from my comment above, I assume you are after something like this?
iris %>%
group_by(Species) %>%
mutate(high_spec = if_else(
Petal.Length == max(Petal.Length), "high_length", "")) %>%
ungroup()
There is no need here for case_when
. Instead use group_by
to perform an operation per Species
.
If you do have multiple conditions, case_when
may make sense; but in that case I recommend having a look at the examples given in ?case_when
. It's also always good practice with case_when
to capture the fall through condition.