I have a data.frame with multiple columns in it. The first in the frame is the dependent variable and the other columns are various independent variables. I'd like to create a table with all the R2s where column1 is y, and the each column is a different x.
Here's an example data.frame:
df <- data.frame(
'A' = runif(20,min=0, max=100),
'B' = runif(20,min=0, max=100),
'C' = runif(20,min=0, max=100),
'D' = runif(20,min=0, max=100),
'E' = runif(20,min=0, max=100)
)
and I'm using a function to calculate R2:
rsq <- function(x, y) summary(lm(y~x,na.action = na.omit))$r.squared
I would like the output to be look like this:
A.B A.C A.D A.E
1 0.009213715 0.009213715 0.009213715 0.009213715
I know I could hard code the table this way:
r2_df<- data.frame(
'A~B'=rsq(x=df$B,y=df$A),
'A~C'=rsq(x=df$C,y=df$A),
'A~D'=rsq(x=df$D,y=df$A),
'A~E'=rsq(x=df$E,y=df$A)
)
But, here's the kicker, my data frame will change from time to time, with different data series and a different number of columns. "A" will stay the same, but next time I pull the data I may end up with columns "A","B","X","Y","Z","P","O","S". So, I don't want to hard code anything, I'd like to just set A as y, and have it loop through the the rest of the columns to produce the table. I'm new to R, and I'm struggling to get an apply function to produce anything.
Thank you for any help!
CodePudding user response:
We may need to loop over the columns other than the first
, apply the rsq
function on the column with the 'A' column, modify the names
of the list
output and then coerce it to data.frame
lst1 <- lapply(df[-1], function(x) rsq(x, df$A))
names(lst1) <- paste0("A.", names(lst1))
as.data.frame(lst1)
-output
A.B A.C A.D A.E
1 0.1514966 0.1207118 0.003884215 0.02558644
NOTE: values are different as the data was created with runif
and there was no set.seed