Home > other >  Merge columns with similar names
Merge columns with similar names

Time:09-10

I have a data frame with very annoying variable names; basically my observations of a same variable are distributed in variables g1_param1, g2_param1, g3_param1, etc. And I would like to join all of them into 1 variable/column named param1.

g1_param1 g2_param1 g1_param2 g2_param2
NA 7 NA 4
1 NA 1 NA

I have some ideas using grepl but that seems quite complicated, for something that can probably be done more easily?

CodePudding user response:

Perhaps this?

no_g <- split(names(dat), sub(".*_", "", names(dat)))
no_g
# $param1
# [1] "g1_param1" "g2_param1"
# $param2
# [1] "g1_param2" "g2_param2"
as.data.frame(
  Map(function(nm, cols) do.call(dplyr::coalesce, dat[cols]), names(no_g), no_g)
)
#   param1 param2
# 1      7      4
# 2      1      1

You can replace dplyr::coalesce with data.table::fcoalesce or a home-grown function if you don't have dplyr available. For instance,

my_coalesce <- function(...) {
  dots <- list(...)
  if (length(dots) == 0) return()
  if (length(dots) == 1) return(dots[[1]])
  out <- dots[[1]]
  for (i in seq_along(dots)[-1]) {
    if (!any(isna <- is.na(out))) break
    out[isna] <- dots[[i]][isna]
  }
  out
}

(Under-tested, it "should work", but it is likely better if you use something a bit more time-proven. Notably, it does not verify that all lengths of dots are equal, as they must be in this operation. Or that any of them have length at all.)


Data:

dat <- structure(list(g1_param1 = c(NA, 1L), g2_param1 = c(7L, NA), g1_param2 = c(NA, 1L), g2_param2 = c(4L, NA)), class = "data.frame", row.names = c(NA, -2L))

CodePudding user response:

library(tidyr)
library(stringr)

df <- structure(list(g1_param1 = c(NA, 1L), g2_param1 = c(7L, NA), 
    g1_param2 = c(NA, 1L), g2_param2 = c(4L, NA)), row.names = c(NA, 
-2L), class = "data.frame")
df
#>   g1_param1 g2_param1 g1_param2 g2_param2
#> 1        NA         7        NA         4
#> 2         1        NA         1        NA


df %>% 
  pivot_longer(everything(),
               names_transform = ~ str_extract(.x, 'param\\d '),
               names_to = '.value',
               values_drop_na = TRUE)
#> # A tibble: 2 × 2
#>   param1 param2
#>    <int>  <int>
#> 1      7      4
#> 2      1      1

Created on 2022-09-09 with reprex v2.0.2

  • Related