I have a Dataframe with the following structure:
np.random.seed(1)
mi = pd.MultiIndex.from_product([[3, 5], ["X","Y","V","T"]], names=["Node", "Parameter"])
df = pd.DataFrame(index=pd.DatetimeIndex(['2022-07-07 12:00:00', '2022-07-07 13:00:00',
'2022-07-07 14:00:00', '2022-07-07 15:00:00',
'2022-07-07 16:00:00'],
dtype='datetime64[ns]', name='Date', freq=None), columns=mi, data=np.random.rand(5,8))
print(df)
Node 3 5
Parameter X Y V T X Y V T
Date
2022-07-07 12:00:00 0.417022 0.720324 0.000114 0.302333 0.146756 0.092339 0.186260 0.345561
2022-07-07 13:00:00 0.396767 0.538817 0.419195 0.685220 0.204452 0.878117 0.027388 0.670468
2022-07-07 14:00:00 0.417305 0.558690 0.140387 0.198101 0.800745 0.968262 0.313424 0.692323
2022-07-07 15:00:00 0.876389 0.894607 0.085044 0.039055 0.169830 0.878143 0.098347 0.421108
2022-07-07 16:00:00 0.957890 0.533165 0.691877 0.315516 0.686501 0.834626 0.018288 0.750144
I would like to add a new calculated column "Z" for each Node, based on the value "X" ** 2 "Y" ** 2.
The following achieves the desired result:
x = df.loc[:,(slice(None),"X")]
y = df.loc[:,(slice(None),"Y")]
z = (x**2).rename(columns={"X":"Z"}) (y ** 2).rename(columns={"Y":"Z"})
result = df.join(z).sort_index(axis=1)
Is there a more straightforward way to achieve this?
For example, using df.xs
to select the desired column data e.g. df.xs("X", axis=1, level=1) **2 df.xs("X", axis=1, level=1) ** 2
, how can I then assign the result to the original dataframe?
CodePudding user response:
You can use groupby.apply
:
(df.groupby(level='Node', axis=1)
.apply(lambda g: g.droplevel('Node', axis=1).eval('Z = X**2 Y**2'))
)
Or, with xs
and drop_level=False
on one of the values:
(df.join((df.xs('X', axis=1, level=1, drop_level=False)**2
df.xs('Y', axis=1, level=1)**2
).rename(columns={'X': 'Z'}, level=1)
)
.sort_index(axis=1, level=0, sort_remaining=False)
)
Output:
Node 3 5
Parameter X Y V T Z X Y V T Z
Date
2022-07-07 12:00:00 0.417022 0.720324 0.000114 0.302333 0.692775 0.146756 0.092339 0.186260 0.345561 0.030064
2022-07-07 13:00:00 0.396767 0.538817 0.419195 0.685220 0.447748 0.204452 0.878117 0.027388 0.670468 0.812891
2022-07-07 14:00:00 0.417305 0.558690 0.140387 0.198101 0.486278 0.800745 0.968262 0.313424 0.692323 1.578722
2022-07-07 15:00:00 0.876389 0.894607 0.085044 0.039055 1.568379 0.169830 0.878143 0.098347 0.421108 0.799977
2022-07-07 16:00:00 0.957890 0.533165 0.691877 0.315516 1.201818 0.686501 0.834626 0.018288 0.750144 1.167884
CodePudding user response:
One option is with pd.xs
:
out = df.xs('X',axis=1,level=1).pow(2).add(df.xs('Y',axis=1,level=1).pow(2))
out.columns = [out.columns, np.repeat(['Z'], 2)]
pd.concat([df, out], axis = 1).sort_index(axis=1)
Node 3 5
Parameter T V X Y Z T V X Y Z
Date
2022-07-07 12:00:00 0.302333 0.000114 0.417022 0.720324 0.692775 0.345561 0.186260 0.146756 0.092339 0.030064
2022-07-07 13:00:00 0.685220 0.419195 0.396767 0.538817 0.447748 0.670468 0.027388 0.204452 0.878117 0.812891
2022-07-07 14:00:00 0.198101 0.140387 0.417305 0.558690 0.486278 0.692323 0.313424 0.800745 0.968262 1.578722
2022-07-07 15:00:00 0.039055 0.085044 0.876389 0.894607 1.568379 0.421108 0.098347 0.169830 0.878143 0.799977
2022-07-07 16:00:00 0.315516 0.691877 0.957890 0.533165 1.201818 0.750144 0.018288 0.686501 0.834626 1.167884
Another option, is to select all the columns, run pow
across all the columns in one go, before grouping and concatenating:
out = (df
.loc(axis=1)[:, ['X','Y']]
.pow(2)
.groupby(level='Node', axis=1)
.agg(np.add.reduce,axis=1))
out.columns = [out.columns, np.repeat(['Z'], 2)]
pd.concat([df, out], axis = 1).sort_index(axis=1)