I have a data.frame where values are repeated in col1
.
col1 <- c("A", "A", "B", "B", "C")
col2 <- c(1995, 1997, 1999, 2000, 2005)
df <- data.frame(col1, col2)
I want to combine values in col2
that correspond to the same letter in col1
into one cell, so that col2
shows a range of values for a particular letter in col1
. I do this by splitting the data.frame by col1
, applying fun
, and binding the split data.frames back together.
library(tidyverse)
split_df <- split(df, df$col1)
fun <- function(df) {
if (length(unique(df$col2)) > 1) {
df$col2 <- paste(min(df$col2),
max(df$col2),
sep = "-")
df <- distinct(df)
}
return(df)
}
split_df <- lapply(split_df, fun)
df <- do.call(rbind, split_df)
This works, but I am wondering if there is a more intuitive or more efficient solution?
CodePudding user response:
One option would be the tidyverse, where you can accomplish this a little more succinctly. The basic idea is the same:
library(tidyverse)
new.result <- df %>%
group_by(col1) %>%
summarize(
col2 = ifelse(n() == 1, as.character(col2), paste(min(col2), max(col2), sep = '-'))
)
col1 col2
<chr> <chr>
1 A 1995-1997
2 B 1999-2000
3 C 2005
A different (but possibly overcomplicated) approach assumes that you have at most two years per grouping. We can pivot the start and end years into their own columns, and then paste them together directly. This requires a little more data transformation but avoids having to check explicitly for groups with 1 year:
df %>%
group_by(col1) %>%
mutate(n = row_number()) %>%
pivot_wider(names_from = n, values_from = col2) %>%
rowwise() %>%
mutate(
vec = list(c(`1`, `2`)),
col2 = paste(vec[!is.na(vec)], collapse = '-')
) %>%
select(col1, col2)
CodePudding user response:
Base R way using aggregate
-
aggregate(col2~col1, df, function(x) paste0(unique(range(x)), collapse = '-'))
# col1 col2
#1 A 1995-1997
#2 B 1999-2000
#3 C 2005
Same can also be written with dplyr
-
library(dplyr)
df %>%
group_by(col1) %>%
summarise(col2 = paste0(unique(range(col2)), collapse = '-'))