so I'm using the pandas library and using sklearn for the DecisionTreeRegressor, I imported a CSV file that I'm using and it has data on different bikes, the columns that I need are model_year,kms_driven, owner(which is the number of owners), price,and power.
while fiting the DecisionTreeRegressor() to the CSV and trying to make it predict the price of a bike model_year=2015, kms driven 30000 and 1 owner it gives me the error
/usr/local/lib/python3.7/dist-packages/sklearn/base.py:451: UserWarning: X does not have valid feature names, but DecisionTreeRegressor was fitted with feature names "X does not have valid feature names, but"
here is my code "
# import libraries
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
# read the data file and create two data arrays
# cleaned dataset without hp (horsepower)
df1 = pd.read_csv("/content/KMS_bikes_cleaned.csv")
# cleaned dataset without kms (kilometers driven)
df2 = pd.read_csv("/content/POWER_bikes_cleaned.csv")
# print the df1 and df2 data arrays
print(df1)
print(df1)
output: model_year kms_driven owner price 0 1970 5000 3 190000 1 1986 8990 2 100000 2 1986 50000 3 110000 3 1990 32000 2 63750 4 1990 13031 3 100000 ... ... ... ... ... 5598 2021 2700 1 155600 5599 2021 3000 1 160000 5600 2021 850 2 70000 5601 2021 100 1 300000 5602 2021 7200 1 195400
[5603 rows x 4 columns] model_year kms_driven owner price 0 1970 5000 3 190000 1 1986 8990 2 100000 2 1986 50000 3 110000 3 1990 32000 2 63750 4 1990 13031 3 100000 ... ... ... ... ... 5598 2021 2700 1 155600 5599 2021 3000 1 160000 5600 2021 850 2 70000 5601 2021 100 1 300000 5602 2021 7200 1 195400
# select predictors
X1 = df1.drop(columns='price')
Y1 = df1 ['price']
# print the X1 and Y2 variables
print(X1)
print(Y1)
output: model_year kms_driven owner 0 1970 5000 3 1 1986 8990 2 2 1986 50000 3 3 1990 32000 2 4 1990 13031 3 ... ... ... ... 5598 2021 2700 1 5599 2021 3000 1 5600 2021 850 2 5601 2021 100 1 5602 2021 7200 1
[5603 rows x 3 columns]
0 190000
1 100000
2 110000
3 63750
4 100000
...
5598 155600
5599 160000
5600 70000
5601 300000
5602 195400
Name: price, Length: 5603, dtype: int64
# name model
model1 = DecisionTreeRegressor()
# fit the model into all of the data
model1.fit (X1,Y1)
# predicted price of a 2015 bike driven 30,000 km sold by its first owner
predict1 = model1.predict ([[2015, 30000, 1]])
#print predict1
print(predict1)
# select predictors
X2 = df2.drop(columns="price")
Y2 = df2["price"]
# print the X and Y variables
print(X2)
print(Y2)
model_year owner power 0 1970 3 19.80 1 1986 2 19.80 2 1986 3 19.80 3 1990 2 11.00 4 1990 3 19.80 ... ... ... ... 5598 2021 1 14.30 5599 2021 1 14.50 5600 2021 2 10.72 5601 2021 1 30.00 5602 2021 1 19.10
[5603 rows x 3 columns]
0 190000
1 100000
2 110000
3 63750
4 100000
...
5598 155600
5599 160000
5600 70000
5601 300000
5602 195400
Name: price, Length: 5603, dtype: int64
# name model
model2 = DecisionTreeRegressor()
# fit the model into all of the data
model2.fit (X2,Y2)
# predicted price of a 2018 bike with 50 bph sold by its second owner
output2 = model2.predict ([[2018, 2, 50.0]])
#print output2
print(output2)
output: /usr/local/lib/python3.7/dist-packages/sklearn/base.py:451: UserWarning: X does not have valid feature names, but DecisionTreeRegressor was fitted with feature names "X does not have valid feature names, but" "
CodePudding user response:
To customize my answer to the proposed duplicate: pass the new data to predict on as a dataframe:
test1 = pd.Dataframe({
"model_year": 2015,
"kms_driven": 30000,
"owner": 1,
})
CodePudding user response:
I could not reproduce the warning message, but I found a similar question here: SKLearn warning "valid feature names" in version 1.0
It seems like this problem is specific to version 1.0. (I'm still using 0.24.2).
If you pass the values of the dataframe, there should be no more warnings.
model2.fit(X2.values, Y2)