Home > Software engineering >  Apply function to two columns at a time with purrr
Apply function to two columns at a time with purrr

Time:04-06

I have a simplified tibble where I select two columns (manually) and pass them to a custom function, but in this case just using sum. Any ideas on how I could expand this to accommodate any number of ko. In this case there's only 2, but let's say there were 5?

library(dplyr)
library(purrr)

df <- tibble(l2fc_ko1 = rnorm(1:10), l2fc_ko2 = rnorm(1:10), ctrl_ko1 = rnorm(1:10), ctrl_ko2 = rnorm(1:10))

df %>% mutate(ko1_sum = map2_dbl(ctrl_ko1, l2fc_ko1, sum),
           ko2_sum = map2_dbl(ctrl_ko2, l2fc_ko2, sum))

CodePudding user response:

We can use pivot_longer to reshape the data, creating a column for each level of ko. Compute the sum, then pivot_wider to get back to your original format:

library(tidyverse)

df %>% 
  mutate(idx = row_number()) %>% 
  pivot_longer(-idx, names_sep = '_', names_to = c('group', 'ko')) %>% 
  pivot_wider(names_from = group, values_from = value) %>% 
  mutate(sum = l2fc   ctrl) %>% 
  pivot_wider(names_from = ko, values_from = c(l2fc, ctrl, sum))

     idx l2fc_ko1 l2fc_ko2 ctrl_ko1 ctrl_ko2 sum_ko1 sum_ko2
   <int>    <dbl>    <dbl>    <dbl>    <dbl>   <dbl>   <dbl>
 1     1  -1.04    -0.710    -0.288   -1.65   -1.33   -2.36 
 2     2   0.0338   0.400    -0.850    0.319  -0.816   0.719
 3     3   2.08     0.723     0.325    0.314   2.40    1.04 
 4     4   0.740   -0.411    -0.307    1.77    0.433   1.36 
 5     5   0.347   -1.57     -0.153    0.657   0.195  -0.915
 6     6  -0.998   -0.145     0.265   -1.95   -0.733  -2.09 
 7     7   2.05    -0.0876   -0.909   -0.190   1.14   -0.278
 8     8   0.0735  -0.134    -2.04    -0.832  -1.96   -0.966
 9     9   1.52     2.37      1.53    -0.596   3.05    1.78 
10    10   1.42    -0.753    -1.61     1.84   -0.194   1.09

CodePudding user response:

How about rowwise? You can specify the columns you want with c or c_across.

df %>%
  rowwise() %>% 
  mutate(total = sum(c_across(ends_with("ko1"))))

# A tibble: 10 x 5
# Rowwise: 
   l2fc_ko1 l2fc_ko2 ctrl_ko1 ctrl_ko2  total
      <dbl>    <dbl>    <dbl>    <dbl>  <dbl>
 1  -0.179   0.496     -1.10   -0.375  -1.27 
 2  -0.0887 -0.873      0.613  -0.348   0.525
 3  -2.33   -0.322     -0.515   3.03   -2.84 
 4  -0.602  -0.0387     0.704  -0.118   0.102
 5  -0.389  -0.00801    0.276   0.500  -0.113
 6  -2.18    0.648     -0.485  -0.243  -2.66 
 7   0.0529  0.237     -0.371  -0.0382 -0.318
 8   0.818  -0.181      1.11   -1.25    1.93 
 9  -0.271  -0.883      0.480  -0.296   0.209
10  -0.208  -1.11       1.09   -0.528   0.882

CodePudding user response:

If you have a dynamic number of paired ctrl_/l2fc_ columns, then try this:

  1. Ensure we have all ctrl_ that have a corresponding l2fc_ (and vice versa):

    ctrls <- grep("^ctrl_ko", names(df), value = TRUE)
    l2fcs <- gsub("^ctrl", "l2fc", ctrls)
    ctrls <- ctrls[ l2fcs %in% names(df) ]
    l2fcs <- l2fcs[ l2fcs %in% names(df) ] # or intersect(l2fcs, names(df))
    
  2. Combine these into one vector (we'll split on it later) and convert this to the new _sum names we'll need.

    nms <- c(l2fcs, ctrls)
    nms
    # [1] "l2fc_ko1" "l2fc_ko2" "ctrl_ko1" "ctrl_ko2"
    newnms <- gsub("ctrl_(.*)", "\\1_sum", ctrls)
    newnms
    # [1] "ko1_sum" "ko2_sum"
    
  3. Using split.default (which will split the df into groups of columns) and rowSums, we can devise two _sum columns:

    setNames(as.data.frame(lapply(split.default(df[nms], gsub(".*_ko", "", nms)), rowSums)), newnms)
    #       ko1_sum    ko2_sum
    # 1   1.0643199  1.7603198
    # 2  -2.3460066  2.9914827
    # 3   0.1912111 -0.3537572
    # 4   1.8475373 -0.8877151
    # 5   2.2994618  0.3716338
    # 6  -0.5365936 -1.0810583
    # 7   1.2542526 -1.0687119
    # 8  -1.8578221 -3.5073630
    # 9   2.4785211 -4.8546746
    # 10 -0.7027090  1.3562360
    
  4. We can cbind/bind_cols those in, or we can mutate them just as well. For the latter, we'll replace df with cur_data() for within the mutate environment, and we'll need to add as.data.frame)

    Choose one of the following, all producing effectively the same results:

    cbind(df, setNames(lapply(split.default(df[nms], gsub(".*_ko", "", nms)), rowSums), newnms))
    bind_cols(df, setNames(lapply(split.default(df[nms], gsub(".*_ko", "", nms)), rowSums), newnms))
    df %>% 
      mutate(
        setNames(
          as.data.frame(
            lapply(split.default(cur_data()[nms], gsub(".*_ko", "", nms)), rowSums)),
          newnms)
      )
    # # A tibble: 10 x 6
    #    l2fc_ko1 l2fc_ko2 ctrl_ko1 ctrl_ko2 ko1_sum ko2_sum
    #       <dbl>    <dbl>    <dbl>    <dbl>   <dbl>   <dbl>
    #  1   1.37      1.30    -0.307   0.455    1.06    1.76 
    #  2  -0.565     2.29    -1.78    0.705   -2.35    2.99 
    #  3   0.363    -1.39    -0.172   1.04     0.191  -0.354
    #  4   0.633    -0.279    1.21   -0.609    1.85   -0.888
    #  5   0.404    -0.133    1.90    0.505    2.30    0.372
    #  6  -0.106     0.636   -0.430  -1.72    -0.537  -1.08 
    #  7   1.51     -0.284   -0.257  -0.784    1.25   -1.07 
    #  8  -0.0947   -2.66    -1.76   -0.851   -1.86   -3.51 
    #  9   2.02     -2.44     0.460  -2.41     2.48   -4.85 
    # 10  -0.0627    1.32    -0.640   0.0361  -0.703   1.36 
    
  • Related