I am trying to summarize rows in a data frame by adding numerical row values and keeping the character values from the second occurrence of the grouping variable.
I have the data frame listed below:
df <- data.frame(
Season = c('Summer', 'Fall', 'Fall', 'Winter','Spring', 'Spring'),
Number = c(1,2,2,6,7,2),
Character = c('1s', '2s', 's', '1s', '3s', 'q')
)
df
Season Number Character
1 Summer 1 1s
2 Fall 2 2s
3 Fall 2 s
4 Winter 6 1s
5 Spring 7 3s
6 Spring 2 q
I am trying to summarize the data into the format listed below but dplyr
's summarize functions don't work well with non-numeric columns.
Here is my expected output...
Season Number Character
1 Summer 1 1s
2 Fall 4 s
4 Winter 6 1s
5 Spring 9 q
CodePudding user response:
You can use [[2]]
inside summarize()
. You’ll also have to handle groups with only one row.
library(dplyr)
df %>%
group_by(Season) %>%
summarize(
Number = sum(Number),
Character = ifelse(length(Character) > 1, Character[[2]], Character)
) %>%
ungroup()
# A tibble: 4 × 3
Season Number Character
<chr> <dbl> <chr>
1 Fall 4 s
2 Spring 9 q
3 Summer 1 1s
4 Winter 6 1s
CodePudding user response:
One approach is to use last
to pick the right string, given it's always ordered like that.
library(dplyr)
df %>%
group_by(Season) %>%
summarize(across(Number:Character, ~ ifelse(is.numeric(.x), sum(.x), last(.x))))
# A tibble: 4 × 3
Season Number Character
<chr> <dbl> <chr>
1 Fall 4 s
2 Spring 9 q
3 Summer 1 1s
4 Winter 6 1s