Home > OS >  How can I summarize rows in a data frame and keep the character values from the second occurrence of
How can I summarize rows in a data frame and keep the character values from the second occurrence of

Time:11-29

I am trying to summarize rows in a data frame by adding numerical row values and keeping the character values from the second occurrence of the grouping variable.

I have the data frame listed below:

df <- data.frame(
  Season = c('Summer', 'Fall', 'Fall', 'Winter','Spring', 'Spring'),
  Number = c(1,2,2,6,7,2),
  Character = c('1s', '2s', 's', '1s', '3s', 'q')
)

df

  Season Number Character
1 Summer      1        1s
2   Fall      2        2s
3   Fall      2         s
4 Winter      6        1s
5 Spring      7        3s
6 Spring      2         q

I am trying to summarize the data into the format listed below but dplyr's summarize functions don't work well with non-numeric columns.

Here is my expected output...

  Season Number Character
1 Summer      1        1s
2   Fall      4        s
4 Winter      6        1s
5 Spring      9        q

CodePudding user response:

You can use [[2]] inside summarize(). You’ll also have to handle groups with only one row.

library(dplyr)

df %>%
  group_by(Season) %>%
  summarize(
    Number = sum(Number),
    Character = ifelse(length(Character) > 1, Character[[2]], Character)
  ) %>%
  ungroup()
# A tibble: 4 × 3
  Season Number Character
  <chr>   <dbl> <chr>    
1 Fall        4 s        
2 Spring      9 q        
3 Summer      1 1s       
4 Winter      6 1s       

CodePudding user response:

One approach is to use last to pick the right string, given it's always ordered like that.

library(dplyr)

df %>% 
  group_by(Season) %>% 
  summarize(across(Number:Character, ~ ifelse(is.numeric(.x), sum(.x), last(.x))))
# A tibble: 4 × 3
  Season Number Character
  <chr>   <dbl> <chr>
1 Fall        4 s
2 Spring      9 q
3 Summer      1 1s
4 Winter      6 1s
  • Related