Combine row values into character vector by condition-CodePudding

I have a data.frame where values are repeated in col1.

col1 <- c("A", "A", "B", "B", "C")
col2 <- c(1995, 1997, 1999, 2000, 2005)

df <- data.frame(col1, col2)

I want to combine values in col2 that correspond to the same letter in col1 into one cell, so that col2 shows a range of values for a particular letter in col1. I do this by splitting the data.frame by col1, applying fun, and binding the split data.frames back together.

library(tidyverse)

split_df <- split(df, df$col1)

fun <- function(df) {
  
  if (length(unique(df$col2)) > 1) {
    
    df$col2 <- paste(min(df$col2),
                     max(df$col2),
                     sep = "-")
    
    df <- distinct(df)
    
  }
  
  return(df)
}

split_df <- lapply(split_df, fun)

df <- do.call(rbind, split_df)

This works, but I am wondering if there is a more intuitive or more efficient solution?

CodePudding user response：

One option would be the tidyverse, where you can accomplish this a little more succinctly. The basic idea is the same:

library(tidyverse)

new.result <- df %>% 
  group_by(col1) %>% 
  summarize(
    col2 = ifelse(n() == 1, as.character(col2), paste(min(col2), max(col2), sep = '-'))
  )

  col1  col2     
  <chr> <chr>    
1 A     1995-1997
2 B     1999-2000
3 C     2005

A different (but possibly overcomplicated) approach assumes that you have at most two years per grouping. We can pivot the start and end years into their own columns, and then paste them together directly. This requires a little more data transformation but avoids having to check explicitly for groups with 1 year:

df %>% 
  group_by(col1) %>% 
  mutate(n = row_number()) %>% 
  pivot_wider(names_from = n, values_from = col2) %>% 
  rowwise() %>% 
  mutate(
    vec = list(c(`1`, `2`)),
    col2 = paste(vec[!is.na(vec)], collapse = '-')
  ) %>% 
  select(col1, col2)

CodePudding user response：

Base R way using aggregate -

aggregate(col2~col1, df, function(x)  paste0(unique(range(x)), collapse = '-'))

#  col1      col2
#1    A 1995-1997
#2    B 1999-2000
#3    C      2005

Same can also be written with dplyr -

library(dplyr)

df %>% 
  group_by(col1) %>%
  summarise(col2 = paste0(unique(range(col2)), collapse = '-'))