My dataframe has 10 features for 5 different tavel modes. The first five features (sp_min - sp_median
) are speed
features for a particular travel mode (e.g. bike
or car
), and the last 5 are bearing
features.
I wouldn't like to normalise the first 5 features since that relate to the particular travel mode I'm predicting (so not known at test time). However, I would like to to normalize the last 5 since bearing ranges from 0 - 360
no matter the travel mode.
How to I normalise these subset of my dataframe features?
df.head()
sp_min sp_max sp_mean sp_std sp_median br_min br_max br_mean br_std br_median
0 0.54 17.63 4.8409 5.448061 1.675 0.0 333.0 32.018 64.692071 11.0
1 0.54 4.78 1.8242 1.049490 1.610 0.0 280.0 26.330 43.658002 13.0
2 0.51 2.80 1.1624 0.526194 1.005 0.0 334.0 53.700 83.268181 20.0
3 0.51 15.57 5.4405 5.055061 2.965 0.0 310.0 23.272 50.490677 7.0
4 0.00 0.93 0.0345 0.155604 0.000 0.0 309.0 4.900 32.809297 0.0
CodePudding user response:
If your columns are consistently named, you can select them all as follows:
target = [c for c in df if c.startswith('br_')
You can then use numpy
and the underlying DataFrame array to normalize your columns:
x = df[target].values
x_norm = (x - x.min(0)) / x.ptp(0)
You can then assign these values to existing columns or new columns in your DataFrame as you need.
CodePudding user response:
You can use:
from sklearn.preprocessing import Normalizer
cols = df.columns[5:]
normalizer = Normalizer().fit(df[cols])
df[cols] = normalizer.transform(df[cols])