I want to perform a column-wise operation in R on column pairs. The function I actually want to use is not the one shown here, because it would complicate this example.
I have a dataframe:
df <- data.frame(p1 = c(-5, -4, 2, 0, -2, 1, 3, 4, 2, 7)
,p2 = c(0, 1, 2, 0, -2, 1, 3, 3, 2, 0))
and a vector of the same length as the df
:
tocompare <- c(0, 0, 2, 0, 2, 4, 16, 12, 6, 9)
I want to run a function that compares each column of df
to the tocompare
object. The steps I need to take is:
- Make a two-element list. First element is a two-column dataframe
x
, in which the first column comes from thedf
and the second column is thetocompare
object. Second element is a number. (this is needed for my actual function to work, I appreciate that it is not needed in this example). This number is constant for all iterations of this process (it's a number of rows indf
/ length oftocompare
) in this example, it's10
.
data1 <- list(x = cbind(df %>% select(1), tocompare), N = length(tocompare))
# select(1) is used rather than df[,1] ensures the column header is kept
- Compare the two columns of the first element (called
x
) of thedata1
list. The function that I use in real life is notcor
; this simplified example captures the problem. I wrotemy_function
in such a way that it needs thedata1
object created above.
my_function <- function(data1){
x <- data1[[1]]
cr <- cor(x[,1], x[,2])
header <- colnames(x)[1]
print(c(header, cr))
}
cr_df1 <- my_function(data1)
I can do the same for the second df
column:
data2 <- list(x = cbind(df %>% select(2), tocompare), N = length(tocompare))
cr_df2 <- my_function(data2)
And make a dataframe of final results:
final_df <- rbind(cr_df1, cr_df2) %>%
`rownames<-`(NULL) %>%
`colnames<-`(c("p", "R")) %>%
as.data.frame()
the output will look like this:
> final_df
p R
1 p1 0.7261224
2 p2 0.6233169
I would like to do this on a dataframe with thousands of columns. The bit I don't know is how to split the single dataframe into multiple two-column dataframes and then run my_function
on these many small dataframes to return a single output. I think I would be able to do it with a loop
and with transposing the df
, but maybe there is a better way (I feel I should try to use map
here)?
CodePudding user response:
A more generic way to do it is to use split.default()
,
lapply(split.default(df, seq(ncol(df))), function(i) cbind(i, tocompare))
$`1`
p1 tocompare
1 -5 0
2 -4 0
3 2 2
4 0 0
5 -2 2
6 1 4
7 3 16
8 4 12
9 2 6
10 7 9
$`2`
p2 tocompare
1 0 0
2 1 0
3 2 2
4 0 0
5 -2 2
6 1 4
7 3 16
8 3 12
9 2 6
10 0 9
Then apply your function to each element of the list
CodePudding user response:
Rather than looping you can use map
to iteratively apply your function. To split up your dataframe into columns, just select each column one at a time. 1:ncol(df)
will generate a sequence of the column numbers. So
library(tidyverse)
map(1:ncol(df), function(column_number) df %>% select(all_of(column_number)))
#> [[1]]
#> p1
#> 1 -5
#> 2 -4
#> 3 2
#> 4 0
#> 5 -2
#> 6 1
#> 7 3
#> 8 4
#> 9 2
#> 10 7
#>
#> [[2]]
#> p2
#> 1 0
#> 2 1
#> 3 2
#> 4 0
#> 5 -2
#> 6 1
#> 7 3
#> 8 3
#> 9 2
#> 10 0
To get your function to process these columns, first alter it to output dataframes
my_function2 <- function(data1){
x <- data1[[1]]
cr <- cor(x[,1], x[,2])
header <- colnames(x)[1]
tibble(header = header, cr = cr)
}
Then wrap it all up with map
but use map_df
so that each iteration gets bound as a row to a dataframe
compare_fn <- function(df, compare_list, my_function){
map_df(1:ncol(df),
function(column_number) my_function(list(x = cbind(df %>% select(all_of(column_number)), compare_list),
N = length(tocompare))))
}
And run it with
compare_fn(df, tocompare, my_function2)
#> # A tibble: 2 × 2
#> header cr
#> <chr> <dbl>
#> 1 p1 0.726
#> 2 p2 0.623