I have a dataframe with columns containing counts of observations for several populations (rows). Similar to the following, but biggest:
your text
df <- data.frame (pop = c("pop1", "pop2", "pop3", "pop4","pop5"),
L1a = c(0,0,0,1,2),
L1b = c(4,2,5,0,0),
L1c = c(1,3,0,3,4),
L2 = c(0,3,1,2,0)
)
df
So I need to merge the columns that are related (that share certain string in the column name, for example the string "L1") and sum up the counts into them. For the previous example a merged dataframe could be the following:
df_merged <- data.frame (pop = c("pop1", "pop2", "pop3", "pop4","pop5"),
L1 = c(5,5,5,4,6),
L2 = c(0,3,1,2,0)
)
df_merged
How could I do that?
I tried to find a post that could help me but there isn´t. I´ll really appreciate your answer
CodePudding user response:
We could use rowSums
on the columns that starts_with
'L1' in column names
library(dplyr)
df %>%
mutate(L1 = rowSums(across(starts_with('L1'))), .keep = "unused", .after = pop)
-output
pop L1 L2
1 pop1 5 0
2 pop2 5 3
3 pop3 5 1
4 pop4 4 2
5 pop5 6 0
If there are many such blocks of columns, use split.default
to split into a list of common columns, do the rowSums
by looping over the list
(map
) and then bind with the subset of original data
library(purrr)
library(stringr)
df %>%
select(where(is.numeric)) %>%
split.default(str_remove(names(.), "[a-z] $")) %>%
map_dfc(rowSums, na.rm = TRUE) %>%
bind_cols(df %>%
select(-where(is.numeric)), .)
-output
pop L1 L2
1 pop1 5 0
2 pop2 5 3
3 pop3 5 1
4 pop4 4 2
5 pop5 6 0
Or use pivot_longer
to reshape to 'long' format before doing the group by sum
library(tidyr)
df %>%
pivot_longer(cols = where(is.numeric), names_to = ".value",
names_pattern = "(L\\d ).*") %>%
group_by(pop) %>%
summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop')
-output
# A tibble: 5 × 3
pop L1 L2
<chr> <dbl> <dbl>
1 pop1 5 0
2 pop2 5 3
3 pop3 5 1
4 pop4 4 2
5 pop5 6 0
CodePudding user response:
rowwise() with c_across() lets one apply arbitrary functions over the inputs , in this case sum()
df1 |>
rowwise() |>
mutate(L1=sum(c_across(contains("L1"))),
.keep="unused",.after=pop) |>
ungroup()