Home > Enterprise >  Subtract / add a vector with many other rows of a data frame
Subtract / add a vector with many other rows of a data frame

Time:10-08

I am unfortunately facing the problem that my for-loops or sweep commands do not work. I have a data set (df1) read in from a large excel spreadsheet (10000 x 26) and another data set (df2) from another excel spreadsheet (1 x 26). I would like to have each row of df1 subtracted / added with df2. Unfortunately my codes do not work. To show it better, I have added a picture below of what I am planning. enter image description here

CodePudding user response:

A vector is added/subtracted by columns so if you want to do it by rows you first have to transpose df1, then perform addition/subtraction and then transpose back:

t(t(df1)   unlist(df2))

CodePudding user response:

I am going to assume that the two dataframes have the dame column headers because they have the same number of columns. You can correct me if this assumption is wrong. I would map out the column names and then subtract the value for the column in df2 from all the values in df2.

library(tidyverse)

#example data
df1 <- mtcars
df2 <- mtcars[1,] |>
  mutate(across(everything(), ~ .   runif(n = 1, min = -10, max = 10)))

#solution
colnames(df1) |>
  map_dfc(\(x){
    
    val <- pull(df2, !!sym(x))
    
    df1 |>
      select(!!sym(x)) |>
      mutate(!!quo(!!sym(x)) := !!sym(x) - val) |>
      as_tibble()
  })
#> # A tibble: 32 x 11
#>       mpg   cyl   disp     hp  drat    wt  qsec    vs    am  gear  carb
#>     <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  -5.87  7.11  -8.99  -2.41  8.10  4.92 0.908 -2.14  2.85 -1.07  8.76
#>  2  -5.87  7.11  -8.99  -2.41  8.10  5.17 1.47  -2.14  2.85 -1.07  8.76
#>  3  -4.07  5.11 -61.0  -19.4   8.05  4.62 3.06  -1.14  2.85 -1.07  5.76
#>  4  -5.47  7.11  89.0   -2.41  7.28  5.51 3.89  -1.14  1.85 -2.07  5.76
#>  5  -8.17  9.11 191.    62.6   7.35  5.74 1.47  -2.14  1.85 -2.07  6.76
#>  6  -8.77  7.11  56.0   -7.41  6.96  5.76 4.67  -1.14  1.85 -2.07  5.76
#>  7 -12.6   9.11 191.   133.    7.41  5.87 0.288 -2.14  1.85 -2.07  8.76
#>  8  -2.47  5.11 -22.3  -50.4   7.89  5.49 4.45  -1.14  1.85 -1.07  6.76
#>  9  -4.07  5.11 -28.2  -17.4   8.12  5.45 7.35  -1.14  1.85 -1.07  6.76
#> 10  -7.67  7.11  -1.39  10.6   8.12  5.74 2.75  -1.14  1.85 -1.07  8.76
#> # ... with 22 more rows

CodePudding user response:

Use purrr::map2_df

Purrr's map functions are a very useful family of functions that iterate over vectors and apply the same function to every element. Briefly:

  • map(x, f) does f(x[[n]]) to every element of x, returning the outputs in a list
  • map_df(x, f) does f(x[[n]]) to every element of x, returning the outputs as a data frame.
  • map2_df(x, y, f) does f(x[[n]], y[[n]]) to every element of x and y in parallel, returning the outputs as a data frame.

Since data frames are lists-of-columns, we can do map2_df(df1, df2, ` `) to run the 'addition function' on every column of df1 and df2 in turn, then convert the resulting list to a data frame in a single line of code.

## setting up the example -----------------------------------------

set.seed("123") # (so that the random numbers are always the same)

## tiny helper to make a neater example:
random_numbers <- function(...) runif(..., max = 10) |> round()

df1 <- data.frame(
  a = random_numbers(3),
  b = random_numbers(3),
  c = random_numbers(3)
  )

df1
#>   a b c
#> 1 3 9 5
#> 2 8 9 9
#> 3 4 0 6

df2 <- data.frame(a = 1, b = 2, c = 3)


## The actual code you'll need ------------------------------------
require(purrr)
#> Loading required package: purrr
map2_df(df1, df2, ` `)
#> # A tibble: 3 × 3
#>       a     b     c
#>   <dbl> <dbl> <dbl>
#> 1     4    11     8
#> 2     9    11    12
#> 3     5     2     9

Created on 2022-10-07 by the reprex package (v2.0.1)

  • Related