Home > database >  How to merge similar rows of a dataframe and concatenate the differences?
How to merge similar rows of a dataframe and concatenate the differences?

Time:08-13

I have a dataframe with some rows that differ by a single value, like below:

V1 V2 V3 V4 V5
A  B  C  D  E
A  B  C  D  F

Is there a convenient way to search for rows that differ only in V5 and then merge these two rows, pasting together the values that differed in V5 to produce:

V1 V2 V3 V4 V5
A  B  C  D  E,F

CodePudding user response:

As you're only interested in merging on the fifth column (as indicated in the comments), one solution is to group by the first four and concatenate the fifth.

E.g. using dplyr:

library(dplyr)

df |> 
  group_by(across(1:4)) |>
  summarise(V5 = paste0(unique(V5), collapse = ",")) |>
  ungroup()

Output:

# A tibble: 2 × 5
  V1    V2    V3    V4    V5   
  <chr> <chr> <chr> <chr> <chr>
1 A     B     C     A     F    
2 A     B     C     D     E,F 

Data:

library(readr)

df <- read_table("V1 V2 V3 V4 V5
A  B  C  D  E
A  B  C  D  F
A  B  C  A  F")
  • Related