Home > front end >  dplyr mean columns grouped by name
dplyr mean columns grouped by name

Time:10-19

Say you have the following data frame:

data.frame(A_1 = rnorm(10),
           A_2 = rnorm(10),
           B_1 = rnorm(10),
           B_2 = rnorm(10),
           B_3 = rnorm(10),
           C_1 = rnorm(10))

I would like to run rowMeans on a per-group basis, so calculate row means for A_1 with A2, then B_1 with B_2 and B_3, and C_1 only with itself.

In my real data, I have a variable number of groups and columns in each group, although I can know in advance the prefixes.

I guess starts_with might help, but not sure how to exactly do this...

CodePudding user response:

df %>% 
  rownames_to_column %>% 
  pivot_longer(-rowname) %>% 
  mutate(name=str_remove(name, "_.")) %>% 
  pivot_wider(values_fn=mean)

 rowname       A      B       C
   <int>   <dbl>  <dbl>   <dbl>
 1     1 -0.214   0.738 -2.49  
 2     2  0.513   1.00   0.402 
 3     3  0.0740  0.323  1.47  
 4     4  0.130   0.108  0.122 
 5     5 -0.0603 -0.321 -0.353 
 6     6 -0.969   0.118  0.0847
 7     7 -0.992  -0.459 -0.174 
 8     8 -0.745   0.338  1.35  
 9     9  0.143  -0.373 -1.06  
10    10  0.0105 -0.341 -0.367 

CodePudding user response:

If I got your point right, this could be one way to do it. So first value of column A is the mean first row of A_1 and A_2 and so forth.

You can split.default function for this purpose. It split a data frame based on column values but first we need to create a vector whose length is equal to the number of columns.

do.call(cbind, lapply(split.default(df, gsub('(\\w)_\\d', '\\1', names(df))), rowMeans))

                A          B           C
 [1,] -0.27871412  1.1904718 -1.96108015
 [2,] -0.14647757 -0.1312197 -2.56966908
 [3,]  0.08151200  0.9202600  0.46074139
 [4,] -0.58858702 -0.8668282 -0.24436829
 [5,] -0.84074058 -0.3621379  0.55942184
 [6,]  0.18514753  0.3805017 -0.54003693
 [7,] -0.48473620 -0.1472854 -0.12439466
 [8,]  0.06439061  0.3810837 -1.23546481
 [9,]  0.62065939  0.7266633  0.90504540
[10,] -0.78956030  0.4548080 -0.02160544

CodePudding user response:

An option with map

library(dplyr)
library(purrr)
library(stringr)
un1 <- unique(str_remove(names(df1), "_\\d "))
map_dfc(setNames(un1, un1), ~ df1 %>% 
   select(starts_with(.x)) %>% 
   rowMeans)

-output

# A tibble: 10 × 3
         A       B        C
     <dbl>   <dbl>    <dbl>
 1  0.151   0.153  -0.592  
 2  0.692   0.140   0.00446
 3 -0.0661  0.732  -0.0713 
 4  0.110   0.232  -1.07   
 5  0.604   0.656  -1.42   
 6 -0.522  -1.42    0.573  
 7 -0.918   0.0367 -0.553  
 8  0.266   1.17    0.547  
 9  0.526  -0.0425 -0.297  
10  0.798  -0.586   0.213  
  • Related