I want to find the number of regression coefficients in the first column of my dataframe. My code raised
TypeError: object of type 'numpy.float64' has no len()
from sklearn.linear_model import LinearRegression
df = pd.read_csv("master.csv")
# Drop redundant features
X = df.drop(['suicides/100k pop', 'country-year', 'suicides_no'], axis=1)
y = df['suicides/100k pop']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression(n_jobs=4, normalize=True, copy_X=True)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"There are {len(model.coef_[0])} regression coefficients:")
print(model.coef_[0])
X_train
print(type(X_train))
> <class 'scipy.sparse.csr.csr_matrix'>
y_train
print(type(y_train))
> <class 'pandas.core.series.Series'>
Traceback:
> --------------------------------------------------------------------------- TypeError Traceback (most recent call
> last) /tmp/ipykernel_6232/341058392.py in <module>
> 1 # Check number of and values of coefficients
> ----> 2 print(f"There are {len(model.coef_[0])} regression coefficients:")
> 3 print(model.coef_[0])
>
> TypeError: object of type 'numpy.float64' has no len()
CodePudding user response:
When you are using [0] you are "calling" a specific value. It is a number therefore it has no len() which is a string function.
If you want to print out the len use:
len(model.coef_)
CodePudding user response:
I am afraid you sound confused on the modeling part, which leads you to request invalid things programmatically.
I want to find the number of regression coefficients in the first column of my dataframe.
There is no such thing. By definition, the number of coefficients in linear regression is equal to the number of variables, i.e. columns in your array/dataframe; so, there will always be one and only one coefficient for the first column in your dataframe.
Similarly, your print
statement:
print(f"There are {len(model.coef_[0])} regression coefficients:")
is incorrect; the number of your regression coefficients will be len(model.coef_)
. model.coef_[0]
is simply the first of these coefficients; it will always be a single number, and that's why len(model.coef_[0])
will always produce an (expected and justifiable) error (single numbers do not have any length).
Demonstrating the above with the toy example available in the docs:
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]]) # 2 variables/columns
# y = 1 * x_0 2 * x_1 3
y = np.dot(X, np.array([1, 2])) 3
model = LinearRegression()
model.fit(X, y)
Here, we have 2 variables (columns) in X
, so it will be
len(model.coef_)
# 2
with
X.shape[1] == len(model.coef_)
# True
by definition.
(Notice that the intercept is not included in the coefficients; it is returned separately with model.intercept_
)
So, in order to actually get what you seem to request in your print
statement, you should change the statements to
print(f"There are {len(model.coef_)} regression coefficients:")
print(model.coef_)
keeping in mind that the reported number will not include the intercept.
The above commands in my example will produce the correct outcome:
There are 2 regression coefficients:
[1. 2.]
And if you also want to include the intercept term for completeness, you should add:
print("and the intercept term is:")
print(model.intercept_)