TypeError: object of type 'numpy.float64' has no len() when printing the regression coeffi-CodePudding

I want to find the number of regression coefficients in the first column of my dataframe. My code raised

TypeError: object of type 'numpy.float64' has no len()

from sklearn.linear_model import LinearRegression

df = pd.read_csv("master.csv")    
# Drop redundant features
X = df.drop(['suicides/100k pop', 'country-year', 'suicides_no'], axis=1)
y = df['suicides/100k pop']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression(n_jobs=4, normalize=True, copy_X=True)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"There are {len(model.coef_[0])} regression coefficients:")
print(model.coef_[0])

X_train

print(type(X_train))

> <class 'scipy.sparse.csr.csr_matrix'>

y_train

print(type(y_train))
> <class 'pandas.core.series.Series'>

Traceback:

> --------------------------------------------------------------------------- TypeError                                 Traceback (most recent call
> last) /tmp/ipykernel_6232/341058392.py in <module>
>       1 # Check number of and values of coefficients
> ----> 2 print(f"There are {len(model.coef_[0])} regression coefficients:")
>       3 print(model.coef_[0])
> 
> TypeError: object of type 'numpy.float64' has no len()

CodePudding user response：

When you are using [0] you are "calling" a specific value. It is a number therefore it has no len() which is a string function.

If you want to print out the len use:

len(model.coef_)

CodePudding user response：

I am afraid you sound confused on the modeling part, which leads you to request invalid things programmatically.

I want to find the number of regression coefficients in the first column of my dataframe.

There is no such thing. By definition, the number of coefficients in linear regression is equal to the number of variables, i.e. columns in your array/dataframe; so, there will always be one and only one coefficient for the first column in your dataframe.

Similarly, your print statement:

print(f"There are {len(model.coef_[0])} regression coefficients:")

is incorrect; the number of your regression coefficients will be len(model.coef_). model.coef_[0] is simply the first of these coefficients; it will always be a single number, and that's why len(model.coef_[0]) will always produce an (expected and justifiable) error (single numbers do not have any length).

Demonstrating the above with the toy example available in the docs:

import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]]) # 2 variables/columns
# y = 1 * x_0   2 * x_1   3
y = np.dot(X, np.array([1, 2]))   3
model = LinearRegression()
model.fit(X, y)

Here, we have 2 variables (columns) in X, so it will be

len(model.coef_)
# 2

with

X.shape[1] == len(model.coef_)
# True

by definition.

(Notice that the intercept is not included in the coefficients; it is returned separately with model.intercept_)

So, in order to actually get what you seem to request in your print statement, you should change the statements to

print(f"There are {len(model.coef_)} regression coefficients:")
print(model.coef_)

keeping in mind that the reported number will not include the intercept.

The above commands in my example will produce the correct outcome:

There are 2 regression coefficients:
[1. 2.]

And if you also want to include the intercept term for completeness, you should add:

print("and the intercept term is:")
print(model.intercept_)