I have a data frame with very annoying variable names; basically my observations of a same variable are distributed in variables g1_param1, g2_param1, g3_param1, etc. And I would like to join all of them into 1 variable/column named param1.
g1_param1 | g2_param1 | g1_param2 | g2_param2 |
---|---|---|---|
NA | 7 | NA | 4 |
1 | NA | 1 | NA |
I have some ideas using grepl but that seems quite complicated, for something that can probably be done more easily?
CodePudding user response:
Perhaps this?
no_g <- split(names(dat), sub(".*_", "", names(dat)))
no_g
# $param1
# [1] "g1_param1" "g2_param1"
# $param2
# [1] "g1_param2" "g2_param2"
as.data.frame(
Map(function(nm, cols) do.call(dplyr::coalesce, dat[cols]), names(no_g), no_g)
)
# param1 param2
# 1 7 4
# 2 1 1
You can replace dplyr::coalesce
with data.table::fcoalesce
or a home-grown function if you don't have dplyr
available. For instance,
my_coalesce <- function(...) {
dots <- list(...)
if (length(dots) == 0) return()
if (length(dots) == 1) return(dots[[1]])
out <- dots[[1]]
for (i in seq_along(dots)[-1]) {
if (!any(isna <- is.na(out))) break
out[isna] <- dots[[i]][isna]
}
out
}
(Under-tested, it "should work", but it is likely better if you use something a bit more time-proven. Notably, it does not verify that all lengths of dots
are equal, as they must be in this operation. Or that any of them have length at all.)
Data:
dat <- structure(list(g1_param1 = c(NA, 1L), g2_param1 = c(7L, NA), g1_param2 = c(NA, 1L), g2_param2 = c(4L, NA)), class = "data.frame", row.names = c(NA, -2L))
CodePudding user response:
library(tidyr)
library(stringr)
df <- structure(list(g1_param1 = c(NA, 1L), g2_param1 = c(7L, NA),
g1_param2 = c(NA, 1L), g2_param2 = c(4L, NA)), row.names = c(NA,
-2L), class = "data.frame")
df
#> g1_param1 g2_param1 g1_param2 g2_param2
#> 1 NA 7 NA 4
#> 2 1 NA 1 NA
df %>%
pivot_longer(everything(),
names_transform = ~ str_extract(.x, 'param\\d '),
names_to = '.value',
values_drop_na = TRUE)
#> # A tibble: 2 × 2
#> param1 param2
#> <int> <int>
#> 1 7 4
#> 2 1 1
Created on 2022-09-09 with reprex v2.0.2