snippet of the dataframe is as follows. but actual dataset is 200000 x 130.
ID 1-jan 2-jan 3-jan 4-jan
1. 4 5 7 8
2. 2 0 1 9
3. 5 8 0 1
4. 3 4 0 0
I am trying to compute Mean Absolute Deviation for each row value like this.
ID 1-jan 2-jan 3-jan 4-jan mean
1. 4 5 7 8 12.5
1_MAD 8.5. 7.5. 5.5 4.5
2. 2 0 1 9 6
2_MAD. 4 6. 5. 3
3. 5 8 0 1 7
4. 3 4 0 0 3.5
I tried this,
new_df = pd.DataFrame()
for rows in (df['ID']):
new_df[str(rows) '_mad'] = mad(df3.loc[row_value][1:])
new_df.T
where mad
is a function that compares the mean to each value.
But, this is very time consuming since i have a large dataset and i need to do in a quickest way possible.
CodePudding user response:
It's possible to specify axis=1
to apply the mean calculation across columns:
df['mean_across_cols'] = df.mean(axis=1)
CodePudding user response:
IIUC use:
df = df.expanding().apply(lambda x: x.mad())
print (df)
1-jan 2-jan 3-jan 4-jan
ID
1.0 0.000000 0.000000 0.000000 0.000000
2.0 1.000000 2.500000 3.000000 0.500000
3.0 1.111111 2.888889 2.888889 3.333333
4.0 1.000000 2.250000 2.500000 4.000000