I'm trying to standardise observations for a group of columns in a dataframe I have without using any in built functions.
I have a list of the columns I want to standardise held in an object called continuous and I'm trying to use list comprehension to apply the standardisation.
I'm having trouble coming up with an approach that allows me to iterate over the rows in my dataframe.
What I've got so far:
continuous = [1, 2, 3, 5, 6, 7, 8, 9, 10]
data_z = [(data[col][i] for i in data.index)-data.mean(col)/data.std(col) for col in continuous]
This is spitting out a type error - it won't let me iterate over a generator object, so I'm wondering if anyone knows the correct approach to iterate over the rows and columns I want to standardise?
Thanks in advance!
CodePudding user response:
Couldn't you just as easily do
mean = data[continuous].mean(axis='rows')
std = data[continuous].std(axis='rows')
data_z = (data[continuous] - mean ) / std
CodePudding user response:
Use DataFrame.sub
with DataFrame.div
with filtered columns in df1
:
np.random.seed(2021)
data = pd.DataFrame(np.random.randint(10, size=(13, 13)))
continuous = [1, 2, 3, 5, 6, 7, 8, 9, 10]
df1 = data[continuous]
data_z = df1.sub(df1.mean()).div(df1.std())
print (data_z)
1 2 3 5 6 7 8 \
0 0.361158 1.309808 -1.365801 0.092504 0.739880 0.553102 0.549700
1 -1.083473 -1.038813 0.309238 -1.410680 -0.977698 -0.944883 -1.314501
2 -1.083473 0.429075 0.979254 0.393140 1.083396 0.253505 0.549700
3 -0.722315 1.016230 0.309238 -1.110043 -1.664730 1.451893 -0.693101
4 1.805788 -0.158080 1.649269 1.295050 0.396364 0.852699 -0.071700
5 -0.722315 -1.332391 -0.025770 -0.508770 -1.664730 1.451893 -0.071700
6 -1.083473 0.135497 0.979254 1.295050 -0.290667 -1.244480 -1.314501
7 1.083473 -1.332391 -0.360778 0.393140 1.083396 -0.345689 1.481801
8 1.444630 1.309808 -1.365801 -1.410680 0.396364 -1.244480 -1.003801
9 0.000000 -1.038813 0.979254 -0.208133 1.083396 0.553102 0.549700
10 -0.361158 -0.745236 -1.365801 0.693777 -0.634183 0.553102 -1.003801
11 -0.361158 0.722653 -0.695785 1.295050 -0.290667 -0.645286 0.860401
12 0.722315 0.722653 -0.025770 -0.809406 0.739880 -1.244480 1.481801
9 10
0 0.369274 0.301124
1 -1.107823 0.301124
2 1.477098 -1.264720
3 0.738549 -0.090337
4 -0.738549 -1.264720
5 -1.107823 -0.873259
6 -0.369274 1.475507
7 -1.477098 -0.873259
8 -0.738549 0.301124
9 0.738549 -0.481798
10 0.738549 -0.481798
11 1.477098 1.475507
12 0.000000 1.475507