I am unfortunately facing the problem that my for-loops or sweep commands do not work. I have a data set (df1) read in from a large excel spreadsheet (10000 x 26) and another data set (df2) from another excel spreadsheet (1 x 26). I would like to have each row of df1 subtracted / added with df2. Unfortunately my codes do not work. To show it better, I have added a picture below of what I am planning. enter image description here
CodePudding user response:
A vector is added/subtracted by columns so if you want to do it by rows you first have to transpose df1
, then perform addition/subtraction and then transpose back:
t(t(df1) unlist(df2))
CodePudding user response:
I am going to assume that the two dataframes have the dame column headers because they have the same number of columns. You can correct me if this assumption is wrong. I would map out the column names and then subtract the value for the column in df2 from all the values in df2.
library(tidyverse)
#example data
df1 <- mtcars
df2 <- mtcars[1,] |>
mutate(across(everything(), ~ . runif(n = 1, min = -10, max = 10)))
#solution
colnames(df1) |>
map_dfc(\(x){
val <- pull(df2, !!sym(x))
df1 |>
select(!!sym(x)) |>
mutate(!!quo(!!sym(x)) := !!sym(x) - val) |>
as_tibble()
})
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -5.87 7.11 -8.99 -2.41 8.10 4.92 0.908 -2.14 2.85 -1.07 8.76
#> 2 -5.87 7.11 -8.99 -2.41 8.10 5.17 1.47 -2.14 2.85 -1.07 8.76
#> 3 -4.07 5.11 -61.0 -19.4 8.05 4.62 3.06 -1.14 2.85 -1.07 5.76
#> 4 -5.47 7.11 89.0 -2.41 7.28 5.51 3.89 -1.14 1.85 -2.07 5.76
#> 5 -8.17 9.11 191. 62.6 7.35 5.74 1.47 -2.14 1.85 -2.07 6.76
#> 6 -8.77 7.11 56.0 -7.41 6.96 5.76 4.67 -1.14 1.85 -2.07 5.76
#> 7 -12.6 9.11 191. 133. 7.41 5.87 0.288 -2.14 1.85 -2.07 8.76
#> 8 -2.47 5.11 -22.3 -50.4 7.89 5.49 4.45 -1.14 1.85 -1.07 6.76
#> 9 -4.07 5.11 -28.2 -17.4 8.12 5.45 7.35 -1.14 1.85 -1.07 6.76
#> 10 -7.67 7.11 -1.39 10.6 8.12 5.74 2.75 -1.14 1.85 -1.07 8.76
#> # ... with 22 more rows
CodePudding user response:
Use purrr::map2_df
Purrr's map functions are a very useful family of functions that iterate over vectors and apply the same function to every element. Briefly:
map(x, f)
doesf(x[[n]])
to every element ofx
, returning the outputs in a listmap_df(x, f)
doesf(x[[n]])
to every element ofx
, returning the outputs as a data frame.map2_df(x, y, f)
doesf(x[[n]], y[[n]])
to every element ofx
andy
in parallel, returning the outputs as a data frame.
Since data frames are lists-of-columns, we can do map2_df(df1, df2, ` `)
to run the 'addition function' on every column of df1 and df2 in turn, then convert the resulting list to a data frame in a single line of code.
## setting up the example -----------------------------------------
set.seed("123") # (so that the random numbers are always the same)
## tiny helper to make a neater example:
random_numbers <- function(...) runif(..., max = 10) |> round()
df1 <- data.frame(
a = random_numbers(3),
b = random_numbers(3),
c = random_numbers(3)
)
df1
#> a b c
#> 1 3 9 5
#> 2 8 9 9
#> 3 4 0 6
df2 <- data.frame(a = 1, b = 2, c = 3)
## The actual code you'll need ------------------------------------
require(purrr)
#> Loading required package: purrr
map2_df(df1, df2, ` `)
#> # A tibble: 3 × 3
#> a b c
#> <dbl> <dbl> <dbl>
#> 1 4 11 8
#> 2 9 11 12
#> 3 5 2 9
Created on 2022-10-07 by the reprex package (v2.0.1)