Home > OS >  Automatic Linear/Multiple Regression in Python with 50 columns
Automatic Linear/Multiple Regression in Python with 50 columns

Time:12-06

I have a dataset with more than 50 columns and I'm trying to find a way in Python to make a simple linear regression between each combination of variables. The goal here is to find a starting point in furthering my analysis (i.e, I will dwelve deeper into those pairs that have a somewhat significant R Square).

I've put all my columns in a list of numpy arrays. How could I go about making a simple linear regression between each combination, and for that combination, print the R square? Is there a possibility to try also a multiple linear regression, with up to 5-6 variables, again with each combination?

Each array has ~200 rows, so code efficiency in terms of speed would not be a big issue for this personal project.

Thanks.

CodePudding user response:

This is more of an EDA problem than a python problem. Look into some regression resources, specifically a correlation matrix. However, one possible solution could use itertools.combinations with a group size of 6. This will give you 15,890,700 different options for running a regression so unless you want to run greater than 15 million regressions you should do some EDA to find important features in your dataset.

CodePudding user response:

If you are looking for columns with high r squared values, just try a correlation matrix. To ease the visualization, I would recommend you to plot a heat map using seaborn:

import seaborn as sns
import matplotlib.pyplot as plt

df_corr = df.corr()
sns.heatmap(df_corr, cmap="coolwarm")
plt.show()

PS.: You can adjust the max and min values of the heatmap using keyword arguments vmin and vmax. Eg:

sns.heatmap(df_corr, vmin=-1, vmax=1, cmap="coolwarm")

Other suggestion I have to you is to run a Principal Component Analysis (PCA) in your dataset to find the features with highest variability. Usually, these variables are the most important, and can be used to make the best predictions. Just let me know if want more info on this technique.

  • Related