I have a data set that has 10,000 rows each row has 248 values and these values determine if that row is a zero or one. I am trying to figure out why this is so. I am trying to plot the logistic regression line from
LR = LogisticRegression(random_state=0, solver='lbfgs', multi_class='ovr',fit_intercept=True).fit(X, Y)
So I can see why they are classified how they are. But I can't figure out how to do this, I can't use a scatter plot since there x data has way more value then the label data.
My question is how would I go about plotting this.
CodePudding user response:
I could suggest plotting the logistic regression using
import seaborn as sns
sns.regplot(x='target', y='variable', data=data, logistic=True)
But that takes a single variable input. Since you are trying to find correlations with a large number of inputs, I would look for feature importance first, running this
from sklearn.linear_model import LogisticRegression
m = LogisticRegression()
m.fit(X, y)
print(m.coef_)
The next steps would be applying PCA to either eliminate some features or condense them into fewer variables and running a correlation matrix.
P.S. what does a zero or one represent?