I would need to transform float to int. However, I would like to not loose any information while converting it. The values (from a dataframe column used a y in modeling build) that I am taking into account are as follows:
-1.0
0.0
9.0
-0.5
1.5
1.5
...
If I convert them into int directly, I might get -0.5 as 0 or -1, so I will loose some information.
I need to convert the values above to int because I need to pass them to fit a model model.fit(X, y)
. Any format that could allow me to pass these values in the fit function (the above column is meant y column)?
Code:
from sklearn.preprocessing import MinMaxScaler
le = preprocessing.LabelEncoder()
X = df[['Col1','Col2']].apply(le.fit_transform)
X_transformed=np.concatenate(((X[['Col1']]),(X[['Col2']])), axis=1)
y=df['Label'].values
scaler=MinMaxScaler()
X_scaled=scaler.fit_transform(X_transformed)
model_LS = LabelSpreading(kernel='knn',
gamma=70,
alpha=0.5,
max_iter=30,
tol=0.001,
n_jobs=-1,
)
LS=model_LS.fit(X_scaled, y)
Data:
Col1 Col2 Label
Cust1 Cust2 1.0
Cust1 Cust4 1.0
Cust4 Cust5 -1.5
Cust12 Cust6 9.0
The error that I am getting running the above code is:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-174-14429cc07d75> in <module>
2
----> 3 LS=model_LS.fit(X_scaled, y)
~/opt/anaconda3/lib/python3.8/site-packages/sklearn/semi_supervised/_label_propagation.py in fit(self, X, y)
228 X, y = self._validate_data(X, y)
229 self.X_ = X
--> 230 check_classification_targets(y)
231
232 # actual graph construction (implementations should override this)
~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
181 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
182 'multilabel-indicator', 'multilabel-sequences']:
--> 183 raise ValueError("Unknown label type: %r" % y_type)
184
185
ValueError: Unknown label type: 'continuous'
CodePudding user response:
You can multiply your values to remove the decimal part:
df = pd.DataFrame({'Label': [1.0, -1.3, 0.75, 9.0, 7.8236]})
decimals = df['Label'].astype(str).str.split('.').str[1].str.len().max()
df['y'] = df['Label'].mul(float(f"1e{decimals}")).astype(int)
print(df)
# Output:
Label y
0 1.0000 10000
1 -1.3000 -13000
2 0.7500 7500
3 9.0000 90000
4 7.8236 78236
CodePudding user response:
I think you need:
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame(data={'y':[-1.0, 0.0 , 9.0, -0.5, 1.5 , 1.5]})
le = LabelEncoder()
le.fit(df['y'])
df['y'] = le.transform(df['y'])
print(df)
OUTPUT
y
0 0
1 2
2 4
3 1
4 3
5 3