Home > Software engineering >  Normalise subset of a dataset features
Normalise subset of a dataset features

Time:03-15

My dataframe has 10 features for 5 different tavel modes. The first five features (sp_min - sp_median) are speed features for a particular travel mode (e.g. bike or car), and the last 5 are bearing features.

I wouldn't like to normalise the first 5 features since that relate to the particular travel mode I'm predicting (so not known at test time). However, I would like to to normalize the last 5 since bearing ranges from 0 - 360 no matter the travel mode.

How to I normalise these subset of my dataframe features?

df.head()

   sp_min  sp_max  sp_mean    sp_std  sp_median   br_min  br_max  br_mean     br_std  br_median  
0    0.54   17.63   4.8409  5.448061      1.675     0.0   333.0   32.018  64.692071       11.0  
1    0.54    4.78   1.8242  1.049490      1.610     0.0   280.0   26.330  43.658002       13.0  
2    0.51    2.80   1.1624  0.526194      1.005     0.0   334.0   53.700  83.268181       20.0 
3    0.51   15.57   5.4405  5.055061      2.965     0.0   310.0   23.272  50.490677        7.0
4    0.00    0.93   0.0345  0.155604      0.000     0.0   309.0    4.900  32.809297        0.0

CodePudding user response:

If your columns are consistently named, you can select them all as follows:

target = [c for c in df if c.startswith('br_')

You can then use numpy and the underlying DataFrame array to normalize your columns:

x = df[target].values
x_norm = (x - x.min(0)) / x.ptp(0)

You can then assign these values to existing columns or new columns in your DataFrame as you need.

CodePudding user response:

You can use:

from sklearn.preprocessing import Normalizer
cols = df.columns[5:]
normalizer = Normalizer().fit(df[cols])
df[cols] = normalizer.transform(df[cols])
  • Related