I have a Pandas MultiIndex Dataframe in Python with two levels for the index and for the columns that look like this:
miind = pd.MultiIndex.from_product([['A1','A2'],['B1','B2','B3']])
micol = pd.MultiIndex.from_product([['X1','X2'],['Y1','Y2','Y3']])
df = pd.DataFrame((np.arange(len(miind)*len(micol)) % 5).reshape(len(miind),len(micol)),
index=miind, columns=micol)
print(df)
X1 X2
Y1 Y2 Y3 Y1 Y2 Y3
A1 B1 0 1 2 3 4 0
B2 1 2 3 4 0 1
B3 2 3 4 0 1 2
A2 B1 3 4 0 1 2 3
B2 4 0 1 2 3 4
B3 0 1 2 3 4 0
I would like, for each line, to divide Y1 and Y2 by Y3 when Y3 is not 0. I don't know how to combine the condition Y3>0 with the element selection.
What is the best way to do this? np.where()
, a mask, or simply indexing? I access Y3 as follows:
idx = pd.IndexSlice
print(df.loc[idx[:,:],idx[:,'Y3']] > 0)
X1 X2
Y3 Y3
A1 B1 True False
B2 True True
B3 True True
A2 B1 False True
B2 True True
B3 True False
Edit:
This is what I want, using for
loops:
A = ['A1','A2']
B = ['B1','B2','B3']
X = ['X1','X2']
Y = ['Y1','Y2','Y3']
miind = pd.MultiIndex.from_product([A,B])
micol = pd.MultiIndex.from_product([X,Y])
df = pd.DataFrame((np.arange(len(miind)*len(micol)) % 5).reshape(len(miind),len(micol)),
index=miind, columns=micol)
for i, a in enumerate(A):
df1 = df.loc[a]
for j,b in enumerate(B):
df2 = df1.loc[b]
for k,x in enumerate(X):
s1 = df2.loc[x]
if s1['Y3'] > 0:
df.loc[idx[a,b],idx[x,'Y1']] /= s1['Y3']
df.loc[idx[a,b],idx[x,'Y2']] /= s1['Y3']
print(df)
X1 X2
Y1 Y2 Y3 Y1 Y2 Y3
A1 B1 0.000000 0.500000 2 3.000000 4.000000 0
B2 0.333333 0.666667 3 4.000000 0.000000 1
B3 0.500000 0.750000 4 0.000000 0.500000 2
A2 B1 3.000000 4.000000 0 0.333333 0.666667 3
B2 4.000000 0.000000 1 0.500000 0.750000 4
B3 0.000000 0.500000 2 3.000000 4.000000 0
However, this solution is not elegant and does probably not scale well for bigger DataFrames...
CodePudding user response:
You could stack and unstack your dataframe:
# stack the dataframe
tmp = df.stack(level=0)
# divide the columns of the stacked dataframe
tmp.loc[tmp['Y3']!= 0, 'Y1'] /= tmp.loc[tmp['Y3']!= 0, 'Y3']
tmp.loc[tmp['Y3']!= 0, 'Y2'] /= tmp.loc[tmp['Y3']!= 0, 'Y3']
# unstack the divided dataframe
tmp = tmp.unstack(level=2)
At this point, we have:
Y1 Y2 Y3
X1 X2 X1 X2 X1 X2
A1 B1 0.000000 3.000000 0.500000 4.000000 2 0
B2 0.333333 4.000000 0.666667 0.000000 3 1
B3 0.500000 0.000000 0.750000 0.500000 4 2
A2 B1 3.000000 0.333333 4.000000 0.666667 0 3
B2 4.000000 0.500000 0.000000 0.750000 1 4
B3 0.000000 3.000000 0.500000 4.000000 2 0
Not that bad, the the levels of the columns are not what we want. Let us go on...
# reverse the column levels
tmp.columns = pd.MultiIndex.from_tuples((j,i) for i,j in tmp.columns)
# and sort the columns
result = tmp.sort_index(axis=1)
We now have as expected:
X1 X2
Y1 Y2 Y3 Y1 Y2 Y3
A1 B1 0.000000 0.500000 2 3.000000 4.000000 0
B2 0.333333 0.666667 3 4.000000 0.000000 1
B3 0.500000 0.750000 4 0.000000 0.500000 2
A2 B1 3.000000 4.000000 0 0.333333 0.666667 3
B2 4.000000 0.000000 1 0.500000 0.750000 4
B3 0.000000 0.500000 2 3.000000 4.000000 0