I have a code for one row only row0
where I am applying a systematic procedure as follows:
import pandas as pd
z = [[1,1,0,0,1],
[0,1,0,0,1],
[0,1,1,1,1],
[0,0,0,1,0],
[0,0,1,0,1]]
z = pd.DataFrame(z)
df = z.copy()
df.index = ['F1','F2','F3','F4','F5']
df.columns = ['F1','F2','F3','F4','F5']
df
F1 F2 F3 F4 F5
F1 1 1 0 0 1
F2 0 1 0 0 1
F3 0 1 1 1 1
F4 0 0 0 1 0
F5 0 0 1 0 1
a) Let's consider row 'F1'
. Check the position of 0s in this row. For example, there are 0s at column 'F3','F4'
.
row0 = z.loc[[0],:]
col_zeros = row0.loc[:, (row0 == 0).all(axis=0)]
b) Then I converted this column positions into a list
col_list = col_zeros.columns.to_list()
c) Using col_list to hide rows of same position labels 'F3','F4'
from original dataframe z
df2 = z.copy()
df2.drop(df2.index[col_list], axis = 0, inplace = True)
d) Now checking the remaining columns one-by-one if there is 1 present in any of these columns
df3 = df2[col_list]
e) After checking, I can see 1 present in column 2 and not present in column 3. Hence Replacing 0 with 1 in row0 at column position 2 i.e., 'F3'
df.iloc[0,2] = 1
f) The entire code mentioned above from a) to e) is just for row 'F1'
. I want the same procedure to be repeated for rest of the rows 'F2','F3','F4','F5'
.
Output I need after executing above method for all df rows :
F1 F2 F3 F4 F5
F1 1 1 1 0 1
F2 0 1 1 0 1
F3 0 1 1 1 1
F4 0 0 0 1 0
F5 0 1 1 1 1
CodePudding user response:
Your question already offers a solution, so I assume you want something faster then looping through your matrix row-by-row. In short my approach would be
m = df.to_numpy()
print(m | (m * m[:, np.newaxis, :].transpose((0,2,1))).any(axis=1))
which will give you your desired result.
Your problem can be analyzed differently, your step (c) is equivalent to masking your matrix by the transpose of your first row. By masking, it can mean a multiplication operation, so that as a result, the masked matrix becomes
F1 F2 F3 F4 F5
F1 1 1 0 0 1
F2 0 1 0 0 1
F3 0 0 0 0 0
F4 0 0 0 0 0
F5 0 0 1 0 1
Note that instead of removing F3 and F4, they become zeros if you multiply each column with the tranpose of your first row. Then you only need to apply any
along the zeroth axis to get this
F1 F2 F3 F4 F5
1 1 1 0 1
and do a element-wise or
of this outcome to the first row of your matrix to get the desired version of first row
F1 F2 F3 F4 F5
F1 1 1 1 0 1
F2 0 1 0 0 1
F3 0 0 0 0 0
F4 0 0 0 0 0
F5 0 0 1 0 1
` Note that only the first row is done.
Obviously this is an algebraic problem and my preference to this matrix algebra is numpy
so I will first
m = df.to_numpy()
Then this is the masking step
m * m[:, np.newaxis, :].transpose((0,2,1))
and take the any
as described above
(m * m[:, np.newaxis, :].transpose((0,2,1))).any(axis=1)
and finally the element-wise or
m | (m * m[:, np.newaxis, :].transpose((0,2,1))).any(axis=1)